That would be perfect actually. And it would radically reduce & modularize maintenance. Maybe that's the best way to go after all. Quite sensible.
--t > On Jun 3, 2015, at 12:46 PM, Vincent Carey <st...@channing.harvard.edu> wrote: > > It really isn't hard to have multiple OrganismDb packages in place -- the > process of making new ones is documented and was given as an exercise in > the EdX course. I don't know if we want to institutionalize it and > distribute such -- I think we might, so that there would be Hs19, Hs38, > mm9, etc. packages. They have very little content, they just coordinate > interactions with packages that you'll already have. > > On Wed, Jun 3, 2015 at 3:26 PM, Tim Triche, Jr. <tim.tri...@gmail.com> > wrote: > >> Right, I typically do that too, and if you're working on human data it >> isn't a big deal. What makes things a lot more of a drag is when you work >> on e.g. mouse data (mm9 vs mm10, aka GRCm37 vs GRCm38) where Mus.musculus >> is essentially a "build ahead" of Homo.sapiens. >> >> R> seqinfo(Homo.sapiens) >> Seqinfo object with 93 sequences (1 circular) from hg19 genome >> >> R> seqinfo(Mus.musculus) >> Seqinfo object with 66 sequences (1 circular) from mm10 genome: >> >> It's not as explicit as directly assigning the seqinfo from a genome that >> corresponds to that of your annotations/results/whatever. I know we could >> all use crossmap or liftOver or whatever, but that's not really the same, >> and it takes time, whereas assigning the proper seqinfo for relationships >> is very fast. >> >> That's all I was getting at... >> >> >> Statistics is the grammar of science. >> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> >> >> On Wed, Jun 3, 2015 at 12:17 PM, Vincent Carey <st...@channing.harvard.edu >>> wrote: >> >>> I typically get this info from Homo.sapiens. The result is parasitic >>> on >>> the TxDb that is in there. I don't know how easy it is to swap alternate >>> TxDb in to get a different build. I think it would make sense to regard >>> the OrganismDb instances as foundational for this sort of structural data. >>> >>> On Wed, Jun 3, 2015 at 3:12 PM, Kasper Daniel Hansen < >>> kasperdanielhan...@gmail.com> wrote: >>> >>>> Let me rephrase this slightly. From one POV the purpose of GenomeInfoDb >>>> is >>>> clean up the seqinfo slot. Currently it does most of the cleaning, but >>>> it >>>> does not add seqlengths. >>>> >>>> It is clear that seqlengths depends on the version of the genome, but I >>>> will argue so does the seqnames. Of course, for human, chr22 will not >>>> change. But what about the names of all the random contigs? Or for >>>> other >>>> organisms, what about going from a draft genome with 10k contigs to a >>>> more >>>> completely genome assembled into fewer, larger chromosomes. >>>> >>>> I acknowledge that this information is present in the BSgenome packages, >>>> but it seems (to me) to be very appropriate to have them around for >>>> cleaning up the seqinfo slot. For some situations it is not great to >>>> depend on 1 GB> download for something that is a few bytes. >>>> >>>> Best, >>>> Kasper >>>> >>>> On Wed, Jun 3, 2015 at 3:00 PM, Tim Triche, Jr. <tim.tri...@gmail.com> >>>> wrote: >>>> >>>>> It would be nice (for a number of reasons) to have chromosome lengths >>>>> readily available in a foundational package like GenomeInfoDb, so that, >>>>> say, >>>>> >>>>> data(seqinfo.hg19) >>>>> seqinfo(myResults) <- seqinfo.hg19[ seqlevels(myResults) ] >>>>> >>>>> would work without issues. Is there any particular reason this >>>> couldn't >>>>> happen for the supported/available BSgenomes? It would seem like a >>>> simple >>>>> matter to do >>>>> >>>>> R> library(BSgenome.Hsapiens.UCSC.hg19) >>>>> R> seqinfo.hg19 <- seqinfo(Hsapiens) >>>>> R> save(seqinfo.hg19, >>>>> file="~/bioc-devel/GenomeInfoDb/data/seqinfo.hg19.rda") >>>>> >>>>> and be done with it until (say) the next release or next released >>>>> BSgenome. I considered looping through the following BSgenomes >>>> myself... >>>>> and if it isn't strongly opposed by (everyone) I may still do exactly >>>>> that. Seems useful, no? >>>>> >>>>> e.g. for the following 42 builds, >>>>> >>>>> grep("(UCSC|NCBI)", unique(gsub(".masked", "", available.genomes())), >>>>> value=TRUE) >>>>> [1] "BSgenome.Amellifera.UCSC.apiMel2" >>>> "BSgenome.Btaurus.UCSC.bosTau3" >>>>> >>>>> [3] "BSgenome.Btaurus.UCSC.bosTau4" >>>> "BSgenome.Btaurus.UCSC.bosTau6" >>>>> >>>>> [5] "BSgenome.Btaurus.UCSC.bosTau8" "BSgenome.Celegans.UCSC.ce10" >>>>> >>>>> [7] "BSgenome.Celegans.UCSC.ce2" "BSgenome.Celegans.UCSC.ce6" >>>>> >>>>> [9] "BSgenome.Cfamiliaris.UCSC.canFam2" >>>>> "BSgenome.Cfamiliaris.UCSC.canFam3" >>>>> [11] "BSgenome.Dmelanogaster.UCSC.dm2" >>>>> "BSgenome.Dmelanogaster.UCSC.dm3" >>>>> [13] "BSgenome.Dmelanogaster.UCSC.dm6" >>>> "BSgenome.Drerio.UCSC.danRer5" >>>>> >>>>> [15] "BSgenome.Drerio.UCSC.danRer6" >>>> "BSgenome.Drerio.UCSC.danRer7" >>>>> >>>>> [17] "BSgenome.Ecoli.NCBI.20080805" >>>>> "BSgenome.Gaculeatus.UCSC.gasAcu1" >>>>> [19] "BSgenome.Ggallus.UCSC.galGal3" >>>> "BSgenome.Ggallus.UCSC.galGal4" >>>>> >>>>> [21] "BSgenome.Hsapiens.NCBI.GRCh38" "BSgenome.Hsapiens.UCSC.hg17" >>>>> >>>>> [23] "BSgenome.Hsapiens.UCSC.hg18" "BSgenome.Hsapiens.UCSC.hg19" >>>>> >>>>> [25] "BSgenome.Hsapiens.UCSC.hg38" >>>>> "BSgenome.Mfascicularis.NCBI.5.0" >>>>> [27] "BSgenome.Mfuro.UCSC.musFur1" >>>> "BSgenome.Mmulatta.UCSC.rheMac2" >>>>> >>>>> [29] "BSgenome.Mmulatta.UCSC.rheMac3" >>>> "BSgenome.Mmusculus.UCSC.mm10" >>>>> >>>>> [31] "BSgenome.Mmusculus.UCSC.mm8" "BSgenome.Mmusculus.UCSC.mm9" >>>>> >>>>> [33] "BSgenome.Ptroglodytes.UCSC.panTro2" >>>>> "BSgenome.Ptroglodytes.UCSC.panTro3" >>>>> [35] "BSgenome.Rnorvegicus.UCSC.rn4" >>>> "BSgenome.Rnorvegicus.UCSC.rn5" >>>>> >>>>> [37] "BSgenome.Rnorvegicus.UCSC.rn6" >>>>> "BSgenome.Scerevisiae.UCSC.sacCer1" >>>>> [39] "BSgenome.Scerevisiae.UCSC.sacCer2" >>>>> "BSgenome.Scerevisiae.UCSC.sacCer3" >>>>> [41] "BSgenome.Sscrofa.UCSC.susScr3" >>>> "BSgenome.Tguttata.UCSC.taeGut1" >>>>> >>>>> >>>>> >>>>> >>>>> Am I insane for suggesting this? It would make things a little easier >>>> for >>>>> rtracklayer, most SummarizedExperiment and SE-derived objects, blah, >>>> blah, >>>>> blah... >>>>> >>>>> >>>>> Best, >>>>> >>>>> --t >>>>> >>>>> >>>>> >>>>> >>>>> Statistics is the grammar of science. >>>>> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel