Re: [Bioc-devel] information about sequencing experiment

Mike Love Sun, 03 Feb 2013 11:16:58 -0800

So ideally, this wouldn't make anyone's life more difficult.  In my 
mind, the nice thing about the eSet's use of MIAME is that it is 
voluntary and minimal, but a reminder of those things which could be 
important if the data object drops into someone's hands in the future.


For example, some HTS-specific fields from GEO are:

extract protocol (e.g. Illumina TruSeq Stranded Total RNA)
instrument model
read length
single-end or paired-end

For an SE produced by summarizeOverlaps, what counting mode was used?  
If applicable, something like "origin of features" (e.g. 
TxDb.Hsapiens.UCSC.hg19.knownGene)?

best,

Mike

On 2/3/13 6:43 PM, Tim Triche, Jr. wrote:
> When I first started pulling GEO eSet representations into SE/sset 
> objects, I found that I had to write something to handle the mandatory 
> MIAME data:
>
> setAs("MIAME", "SimpleList",
>   function(from) { # {{{
>     to = list()
>     for(i in slotNames(from)) if(i != '.__classVersion__') 
> to[[i]]=slot(from, i)
>     return(SimpleList(to))
>   }
> ) # }}}
>
> And then of course the SimpleList went into the sset exptData slot.
>
> I've been doing this for a while to GEO data so that I can coerce it 
> into sset/SE objects (I'll start calling them 'sset' even though it 
> doesn't make sense as an acronym ;-)).  But MIAME is, specifically, 
> Minimal Information About a Microarray Experiment.  The closest I can 
> think of would be the MAGE-TAB  representation for TCGA sequencing 
> experiments.  The investigation (library prep, sequencing, 
> quantification, etc.) is described in the IDF:
>
> https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/thca/cgcc/unc.edu/illuminahiseq_rnaseqv2/rnaseqv2/unc.edu_THCA.IlluminaHiSeq_RNASeqV2.mage-tab.1.7.0/unc.edu_THCA.IlluminaHiSeq_RNASeqV2.1.7.0.idf.txt
>
> The samples are then described in the SDRF:
>
> https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/thca/cgcc/unc.edu/illuminahiseq_rnaseqv2/rnaseqv2/unc.edu_THCA.IlluminaHiSeq_RNASeqV2.mage-tab.1.7.0/unc.edu_THCA.IlluminaHiSeq_RNASeqV2.1.7.0.sdrf.txt
>
> And all the plain-English parts are further described here (thanks 
> Katie!):
>
> https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/thca/cgcc/unc.edu/illuminahiseq_rnaseqv2/rnaseqv2/unc.edu_THCA.IlluminaHiSeq_RNASeqV2.mage-tab.1.7.0/DESCRIPTION.txt
>
> Speaking from experience, it is a pain in the (arbitrary appendage) to 
> assemble these, but they are essentially self-contained experiments 
> for the end user.  This is one of the reasons I like using sset 
> objects even for data from GEO: I can keep all the exptData, I can map 
> all the probes/reads/etc. to the appropriate genome build (and 
> swap/lift assemblies as needed), and it's trivial to compare (say) 
> RNAseq results to HuEx to 3' array results.
>
> So I'm not against support for this, although it would make rival 
> labs' lives easier, which isn't always my goal in life ;-)
>
>
>
> On Sun, Feb 3, 2013 at 9:32 AM, Martin Morgan <mtmor...@fhcrc.org 
> <mailto:mtmor...@fhcrc.org>> wrote:
>
>     On 02/03/2013 06:37 AM, Mike Love wrote:
>
>         hi,
>
>         Does/should there exist a class similar to MIAME for
>         sequencing data, e.g. slots
>         concerning the library preparation, alignment, etc.?
>
>         This could then be suggested as something to include in the
>         exptData SimpleList
>         of SummarizedExperiment.
>
>
>
>     As it is one could certainly
>
>     > se = SummarizedExperiment()
>     > exptData(se) = list(MIAME())
>
>     If we want to go down this route then I think the right strategy
>     would be to make the exptData slot more strict. But what would the
>     MIAME-like container look like? The basics are probably shared,
>     but what else?
>
>     > slotNames("MIAME")
>      [1] "name"              "lab"               "contact"
>      [4] "title"             "abstract"          "url"
>      [7] "pubMedIds"         "samples" "hybridizations"
>     [10] "normControls"      "preprocessing"     "other"
>     [13] ".__classVersion__"
>
>     Martin
>
>
>
>         best,
>
>         Mike
>
>         _______________________________________________
>         Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
>         mailing list
>         https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>     -- 
>     Computational Biology / Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N.
>     PO Box 19024 Seattle, WA 98109
>
>     Location: Arnold Building M1 B861
>     Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>
>     _______________________________________________
>     Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing
>     list
>     https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
> -- 
> /A model is a lie that helps you see the truth./
> /
> /
> Howard Skipper 
> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] information about sequencing experiment

Reply via email to