So ideally, this wouldn't make anyone's life more difficult. In my
mind, the nice thing about the eSet's use of MIAME is that it is
voluntary and minimal, but a reminder of those things which could be
important if the data object drops into someone's hands in the future.
For example, some HTS-specific fields from GEO are:
extract protocol (e.g. Illumina TruSeq Stranded Total RNA)
instrument model
read length
single-end or paired-end
For an SE produced by summarizeOverlaps, what counting mode was used?
If applicable, something like "origin of features" (e.g.
TxDb.Hsapiens.UCSC.hg19.knownGene)?
best,
Mike
On 2/3/13 6:43 PM, Tim Triche, Jr. wrote:
> When I first started pulling GEO eSet representations into SE/sset
> objects, I found that I had to write something to handle the mandatory
> MIAME data:
>
> setAs("MIAME", "SimpleList",
> function(from) { # {{{
> to = list()
> for(i in slotNames(from)) if(i != '.__classVersion__')
> to[[i]]=slot(from, i)
> return(SimpleList(to))
> }
> ) # }}}
>
> And then of course the SimpleList went into the sset exptData slot.
>
> I've been doing this for a while to GEO data so that I can coerce it
> into sset/SE objects (I'll start calling them 'sset' even though it
> doesn't make sense as an acronym ;-)). But MIAME is, specifically,
> Minimal Information About a Microarray Experiment. The closest I can
> think of would be the MAGE-TAB representation for TCGA sequencing
> experiments. The investigation (library prep, sequencing,
> quantification, etc.) is described in the IDF:
>
> https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/thca/cgcc/unc.edu/illuminahiseq_rnaseqv2/rnaseqv2/unc.edu_THCA.IlluminaHiSeq_RNASeqV2.mage-tab.1.7.0/unc.edu_THCA.IlluminaHiSeq_RNASeqV2.1.7.0.idf.txt
>
> The samples are then described in the SDRF:
>
> https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/thca/cgcc/unc.edu/illuminahiseq_rnaseqv2/rnaseqv2/unc.edu_THCA.IlluminaHiSeq_RNASeqV2.mage-tab.1.7.0/unc.edu_THCA.IlluminaHiSeq_RNASeqV2.1.7.0.sdrf.txt
>
> And all the plain-English parts are further described here (thanks
> Katie!):
>
> https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/thca/cgcc/unc.edu/illuminahiseq_rnaseqv2/rnaseqv2/unc.edu_THCA.IlluminaHiSeq_RNASeqV2.mage-tab.1.7.0/DESCRIPTION.txt
>
> Speaking from experience, it is a pain in the (arbitrary appendage) to
> assemble these, but they are essentially self-contained experiments
> for the end user. This is one of the reasons I like using sset
> objects even for data from GEO: I can keep all the exptData, I can map
> all the probes/reads/etc. to the appropriate genome build (and
> swap/lift assemblies as needed), and it's trivial to compare (say)
> RNAseq results to HuEx to 3' array results.
>
> So I'm not against support for this, although it would make rival
> labs' lives easier, which isn't always my goal in life ;-)
>
>
>
> On Sun, Feb 3, 2013 at 9:32 AM, Martin Morgan <[email protected]
> <mailto:[email protected]>> wrote:
>
> On 02/03/2013 06:37 AM, Mike Love wrote:
>
> hi,
>
> Does/should there exist a class similar to MIAME for
> sequencing data, e.g. slots
> concerning the library preparation, alignment, etc.?
>
> This could then be suggested as something to include in the
> exptData SimpleList
> of SummarizedExperiment.
>
>
>
> As it is one could certainly
>
> > se = SummarizedExperiment()
> > exptData(se) = list(MIAME())
>
> If we want to go down this route then I think the right strategy
> would be to make the exptData slot more strict. But what would the
> MIAME-like container look like? The basics are probably shared,
> but what else?
>
> > slotNames("MIAME")
> [1] "name" "lab" "contact"
> [4] "title" "abstract" "url"
> [7] "pubMedIds" "samples" "hybridizations"
> [10] "normControls" "preprocessing" "other"
> [13] ".__classVersion__"
>
> Martin
>
>
>
> best,
>
> Mike
>
> _______________________________________________
> [email protected] <mailto:[email protected]>
> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>
> _______________________________________________
> [email protected] <mailto:[email protected]> mailing
> list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
> --
> /A model is a lie that helps you see the truth./
> /
> /
> Howard Skipper
> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
[[alternative HTML version deleted]]
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel