Hi Sonja, Here's some information regarding your recent question I received from one of our ENCODE staff:
The contents of the downloads-only files are left to the discretion of the contributing laboratories. In this case, the contributor is Barbara Wold of Caltech. You may contact Georgi Marinov <[email protected] <mailto:[email protected]>> about the specifics of the files. To aide you in your communication with the lab, below is a key to what the Wold lab calls RawData5, 6, and 7. RawData5 = final.rpkm RawData6 = gencode_exon RawData7 = accepted.rpkm The Wold lab has currently submitted data for hg19 which is not yet reviewed for public release. The downloads that they are generating now are listed here. You may wish to inquire about these data as well. Expression Estimates and Transcript Models (Cufflinks): .junctions - BED12 file containing TopHat-defined splice junctions. GeneModel.gtf - a gtf file containing gene models produced by Cufflinks in de novo mode. GeneDeNovo.fpkm - FPKM expression level estimates at the gene level for de novo assembled transcripts. FPKM (Fragments Per Kilobase per Million reads, where a fragment is defined as the nucleic acid fragment from which reads originate, and a pair of reads is counted as one fragment) is a metric analogous to the widley used RPKM (Reads Per Kilobase per Million reads), which normalizes against both transcript length and sequencing depth. TranscriptDeNovo.fpkm - FPKM expression level estimates at the transcript level for de novo assembled transcripts. GeneGencV3c.fpkm - FPKM expression level estimates at the gene level for the GENCODE CRCh37.v3c annotation. TranscriptGencV3c.fpkm - FPKM expression level estimates at the transcript level for the GENCODE CRCh37.v3c annotation. GeneGencV4.fpkm - FPKM expression level estimates at the gene level for the GENCODE CRCh37.v4 annotation. TranscriptGencV4.fpkm - FPKM expression level estimates at the transcript level for the GENCODE CRCh37.v4 annotation. I hope that is helpful. If you have an additional questions, please contact us again at [email protected] - Greg Roe UCSC Genome Bioinformatics Group On 2/28/11 1:50 AM, Sonja Althammer wrote: > Good morning, > > I wanted to download the RPKM-files from the RNA-Seq experiment in the > cell-line GM12878 (ENCODE). > Surprisingly I found 3 files that differ only in view=data5, data6 or data7. > What is this supposed to mean? and how can I treat them? Are they supposed > to be combined? But why are they in different files then? > Below you see the links.. > > Thanks a lot in advance and have a nice day! > Sonja > > > http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeCaltechRnaSeq/ > > 2009-12-06 > wgEncodeCaltechRnaSeqRawData5Rep1Gm12878CellLongpolyaErng32x75.rpkm.gz<http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqRawData5Rep1Gm12878CellLongpolyaErng32x75.rpkm.gz> > 538K 2009-03-06 dataType=RnaSeq; cell=GM12878; rnaExtract=longPolyA; > localization=cell; replicate=1; subId=266; dataVersion=ENCODE Feb 2009 > Freeze; grant=Myers; lab=Caltech; labVersion=erange3.0.1; > mapAlgorithm=erng3; view=RawData5; type=rpkm; insertLength=200; > readType=2x75 2009-12-06 > wgEncodeCaltechRnaSeqRawData6Rep1Gm12878CellLongpolyaErng32x75.rpkm.gz<http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqRawData6Rep1Gm12878CellLongpolyaErng32x75.rpkm.gz> > 3.9M 2009-03-06 dataType=RnaSeq; cell=GM12878; rnaExtract=longPolyA; > localization=cell; replicate=1; subId=266; dataVersion=ENCODE Feb 2009 > Freeze; grant=Myers; lab=Caltech; labVersion=erange3.0.1; > mapAlgorithm=erng3; view=RawData6; type=rpkm; insertLength=200; > readType=2x75 2009-12-06 > wgEncodeCaltechRnaSeqRawData7Rep1Gm12878CellLongpolyaErng32x75.rpkm.gz<http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqRawData7Rep1Gm12878CellLongpolyaErng32x75.rpkm.gz> > 219K 2009-03-06 dataType=RnaSeq; cell=GM12878; rnaExtract=longPolyA; > localization=cell; replicate=1; subId=266; dataVersion=ENCODE Feb 2009 > Freeze; grant=Myers; lab=Caltech; labVersion=erange3.0.1; > mapAlgorithm=erng3; view=RawData7; type=rpkm; insertLength=200; > readType=2x75 > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
