Hi Vanessa, Thanks for your explanation. I'm also compiling a TFBS dataset by using the Chip-Seq data from ENCODE project. I have tried to combine the data from the five major groups, but it's not an easy task for me.
I have found a set of clustered TFBSs from Kent's lab on the website ( http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRegTfbsClustered/). Seem this is exactly what I need. Could you please tell me where I can find detailed description of this dataset, such as how the TFBSs clustered, what's the meaning of the scores in the bed file, etc ? Thanks, Shuli On 04/10/2012 10:30 AM, Vanessa Kirkup Swing wrote: > Hi Anyuan, > > The additional tracks that you seeing in the table browser are in the > genome browser and are grouped under the ENC TF Binding Super-track ( > http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&c=chr21&g=wgEncodeTfBindingSuper > ) > > To see all the available tracks on the genome browser there is a tool > called Track Search. You can get to track search from the gateway page ( > http://genome.ucsc.edu/cgi-bin/hgGateway). Select the assembly you are > interested in and then click on "track search". > > With regards to intersecting data between experiments, we are are unable to > give you advice on that. > > From the track description page for the hg19 TFBS Conserved Track ( > http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&c=chr21&g=tfbsConsSite) > here is what was done that is different from hg18: > > "These data were obtained by running the program tfloc (Transcription > Factor binding site LOCater) on multiz46way alignments, restricting only to > the July 2007 (mm9) mouse genome assembly, the November 2004 rat assembly > (rn4), and the February 2009 human genome assembly (hg19). Transcription > factor information was culled from the Transfac Factor database, version > 7.0." > > Here is what is different for the hg18 TFBS Conserved Track ( > http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&c=chr21&g=tfbsConsSites): > > "These data were obtained by running the program tfloc (Transcription > Factor binding site LOCater) on multiz alignments of the February 2006 > (mm8) mouse genome assembly and the November 2004 rat assembly (rn4) to the > March 2006 human genome assembly (hg18.) Transcription factor information > was culled from the Transfac Factor database, version 7.0." > > These differences would explain why there are larger amounts of data for > hg19. > > I hope that answers your questions. If you have further questions, please > email the list: [email protected]. > > Vanessa Kirkup Swing > UCSC Genome Bioinformatics Group > > > ---------- Forwarded message ---------- > From: 郭安源<[email protected]> > Date: 2012/4/10 > Subject: [Genome] ask about the CHIP-seq and tfbs data for human hg19 > To: [email protected] > > > Dear Sir/Madam, > I am trying to use the human CHIP-seq data on UCSC and now I have > several questions about it. > From the hg19 browser page, the only one chip-seq data in the > regulation tracks is the "ENCODE regulation tracks", which includes the > "Txn Factor ChIP" track. However, from the Table browser download page, we > can find several other CHIP TFBS tracks, such as HAIB TFBS and UTA TFBS etc. > Is it because that the Txn Factor ChIP track includes all the data of > others? So if I need the most comprehensive CHIP data, should I donwload > only the Txn Factor ChIP track data or also download other TFBS data in the > table browser page? > I noticed that one track has many experiments for the same TF, such as > for the Nrsf TF in the HAIB TFBS track, there are the following > experiments. For these, should I use the intersection data to reduce false > positve for the Nrsf tfbs? > wgEncodeHaibTfbsGm12878NrsfPcr2xPkRep1.broadPeak > wgEncodeHaibTfbsGm12878NrsfPcr2xPkRep2.broadPeak > wgEncodeHaibTfbsH1hescNrsfV0416102PkRep1.broadPeak > wgEncodeHaibTfbsH1hescNrsfV0416102PkRep2.broadPeak > wgEncodeHaibTfbsHelas3NrsfPcr1xPkRep1.broadPeak > wgEncodeHaibTfbsHelas3NrsfPcr1xPkRep2.broadPeak > wgEncodeHaibTfbsHepg2NrsfPcr2xPkRep1.broadPeak > wgEncodeHaibTfbsHepg2NrsfPcr2xPkRep2.broadPeak > wgEncodeHaibTfbsK562NrsfV0416102PkRep1.broadPeak > wgEncodeHaibTfbsK562NrsfV0416102PkRep2.broadPeak > wgEncodeHaibTfbsPanc1NrsfPcr2xPkRep1.broadPeak > wgEncodeHaibTfbsPanc1NrsfPcr2xPkRep2.broadPeak > wgEncodeHaibTfbsPfsk1NrsfPcr2xPkRep1.broadPeak > wgEncodeHaibTfbsPfsk1NrsfPcr2xPkRep2.broadPeak > wgEncodeHaibTfbsSknshNrsfPcr2xPkRep1.broadPeak > wgEncodeHaibTfbsSknshNrsfPcr2xPkRep2.broadPeak > wgEncodeHaibTfbsU87NrsfPcr2xPkRep1.broadPeak > wgEncodeHaibTfbsU87NrsfPcr2xPkRep2.broadPeak > > For the conserved TFBS prediction in the "TFBS Conserved" track, I noticed > ther are much more data than the data downloaded from hg18 previously. > However, the description page of this track ( > http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=255151677&c=chr21&g=tfbsConsSites), > it seem no different from the hg18 page, which said using the TransFac 7.0 > matrix and the same program. If that, why much more tfbs were predicted? I > guess you use the new version of TransFac matrix but didn't update the > description page, right? > These data are very important for us. I am looking forward for your reply. > Thanks very much. > Best, > Anyuan Guo > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
