that's kind of always been my goal...
Statistics is the grammar of science. Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> On Thu, Jun 4, 2015 at 6:29 PM, Michael Lawrence <lawrence.mich...@gene.com> wrote: > Maybe this could eventually support setting the seqinfo with: > > genome(gr) <- "hg19" > > Or is that being too clever? > > On Thu, Jun 4, 2015 at 4:28 PM, Hervé Pagès <hpa...@fredhutch.org> wrote: > > Hi, > > > > FWIW I started to work on supporting quick generation of a standalone > > Seqinfo object via Seqinfo(genome="hg38") in GenomeInfoDb. > > > > It already supports hg38, hg19, hg18, panTro4, panTro3, panTro2, > > bosTau8, bosTau7, bosTau6, canFam3, canFam2, canFam1, musFur1, mm10, > > mm9, mm8, susScr3, susScr2, rn6, rheMac3, rheMac2, galGal4, galGal3, > > gasAcu1, danRer7, apiMel2, dm6, dm3, ce10, ce6, ce4, ce2, sacCer3, > > and sacCer2. I'll add more. > > > > See ?Seqinfo for some examples. > > > > Right now it fetches the information from internet every time you > > call it but maybe we should just store that information in the > > GenomeInfoDb package as Tim suggested? > > > > H. > > > > > > On 06/03/2015 12:54 PM, Tim Triche, Jr. wrote: > >> > >> That would be perfect actually. And it would radically reduce & > >> modularize maintenance. Maybe that's the best way to go after all. > Quite > >> sensible. > >> > >> --t > >> > >>> On Jun 3, 2015, at 12:46 PM, Vincent Carey <st...@channing.harvard.edu > > > >>> wrote: > >>> > >>> It really isn't hard to have multiple OrganismDb packages in place -- > the > >>> process of making new ones is documented and was given as an exercise > in > >>> the EdX course. I don't know if we want to institutionalize it and > >>> distribute such -- I think we might, so that there would be Hs19, Hs38, > >>> mm9, etc. packages. They have very little content, they just > coordinate > >>> interactions with packages that you'll already have. > >>> > >>> On Wed, Jun 3, 2015 at 3:26 PM, Tim Triche, Jr. <tim.tri...@gmail.com> > >>> wrote: > >>> > >>>> Right, I typically do that too, and if you're working on human data it > >>>> isn't a big deal. What makes things a lot more of a drag is when you > >>>> work > >>>> on e.g. mouse data (mm9 vs mm10, aka GRCm37 vs GRCm38) where > >>>> Mus.musculus > >>>> is essentially a "build ahead" of Homo.sapiens. > >>>> > >>>> R> seqinfo(Homo.sapiens) > >>>> Seqinfo object with 93 sequences (1 circular) from hg19 genome > >>>> > >>>> R> seqinfo(Mus.musculus) > >>>> Seqinfo object with 66 sequences (1 circular) from mm10 genome: > >>>> > >>>> It's not as explicit as directly assigning the seqinfo from a genome > >>>> that > >>>> corresponds to that of your annotations/results/whatever. I know we > >>>> could > >>>> all use crossmap or liftOver or whatever, but that's not really the > >>>> same, > >>>> and it takes time, whereas assigning the proper seqinfo for > >>>> relationships > >>>> is very fast. > >>>> > >>>> That's all I was getting at... > >>>> > >>>> > >>>> Statistics is the grammar of science. > >>>> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> > >>>> > >>>> On Wed, Jun 3, 2015 at 12:17 PM, Vincent Carey > >>>> <st...@channing.harvard.edu > >>>>> > >>>>> wrote: > >>>> > >>>> > >>>>> I typically get this info from Homo.sapiens. The result is parasitic > >>>>> on > >>>>> the TxDb that is in there. I don't know how easy it is to swap > >>>>> alternate > >>>>> TxDb in to get a different build. I think it would make sense to > >>>>> regard > >>>>> the OrganismDb instances as foundational for this sort of structural > >>>>> data. > >>>>> > >>>>> On Wed, Jun 3, 2015 at 3:12 PM, Kasper Daniel Hansen < > >>>>> kasperdanielhan...@gmail.com> wrote: > >>>>> > >>>>>> Let me rephrase this slightly. From one POV the purpose of > >>>>>> GenomeInfoDb > >>>>>> is > >>>>>> clean up the seqinfo slot. Currently it does most of the cleaning, > >>>>>> but > >>>>>> it > >>>>>> does not add seqlengths. > >>>>>> > >>>>>> It is clear that seqlengths depends on the version of the genome, > but > >>>>>> I > >>>>>> will argue so does the seqnames. Of course, for human, chr22 will > not > >>>>>> change. But what about the names of all the random contigs? Or for > >>>>>> other > >>>>>> organisms, what about going from a draft genome with 10k contigs to > a > >>>>>> more > >>>>>> completely genome assembled into fewer, larger chromosomes. > >>>>>> > >>>>>> I acknowledge that this information is present in the BSgenome > >>>>>> packages, > >>>>>> but it seems (to me) to be very appropriate to have them around for > >>>>>> cleaning up the seqinfo slot. For some situations it is not great > to > >>>>>> depend on 1 GB> download for something that is a few bytes. > >>>>>> > >>>>>> Best, > >>>>>> Kasper > >>>>>> > >>>>>> On Wed, Jun 3, 2015 at 3:00 PM, Tim Triche, Jr. < > tim.tri...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> It would be nice (for a number of reasons) to have chromosome > lengths > >>>>>>> readily available in a foundational package like GenomeInfoDb, so > >>>>>>> that, > >>>>>>> say, > >>>>>>> > >>>>>>> data(seqinfo.hg19) > >>>>>>> seqinfo(myResults) <- seqinfo.hg19[ seqlevels(myResults) ] > >>>>>>> > >>>>>>> would work without issues. Is there any particular reason this > >>>>>> > >>>>>> couldn't > >>>>>>> > >>>>>>> happen for the supported/available BSgenomes? It would seem like a > >>>>>> > >>>>>> simple > >>>>>>> > >>>>>>> matter to do > >>>>>>> > >>>>>>> R> library(BSgenome.Hsapiens.UCSC.hg19) > >>>>>>> R> seqinfo.hg19 <- seqinfo(Hsapiens) > >>>>>>> R> save(seqinfo.hg19, > >>>>>>> file="~/bioc-devel/GenomeInfoDb/data/seqinfo.hg19.rda") > >>>>>>> > >>>>>>> and be done with it until (say) the next release or next released > >>>>>>> BSgenome. I considered looping through the following BSgenomes > >>>>>> > >>>>>> myself... > >>>>>>> > >>>>>>> and if it isn't strongly opposed by (everyone) I may still do > exactly > >>>>>>> that. Seems useful, no? > >>>>>>> > >>>>>>> e.g. for the following 42 builds, > >>>>>>> > >>>>>>> grep("(UCSC|NCBI)", unique(gsub(".masked", "", > available.genomes())), > >>>>>>> value=TRUE) > >>>>>>> [1] "BSgenome.Amellifera.UCSC.apiMel2" > >>>>>> > >>>>>> "BSgenome.Btaurus.UCSC.bosTau3" > >>>>>>> > >>>>>>> > >>>>>>> [3] "BSgenome.Btaurus.UCSC.bosTau4" > >>>>>> > >>>>>> "BSgenome.Btaurus.UCSC.bosTau6" > >>>>>>> > >>>>>>> > >>>>>>> [5] "BSgenome.Btaurus.UCSC.bosTau8" > >>>>>>> "BSgenome.Celegans.UCSC.ce10" > >>>>>>> > >>>>>>> [7] "BSgenome.Celegans.UCSC.ce2" > "BSgenome.Celegans.UCSC.ce6" > >>>>>>> > >>>>>>> [9] "BSgenome.Cfamiliaris.UCSC.canFam2" > >>>>>>> "BSgenome.Cfamiliaris.UCSC.canFam3" > >>>>>>> [11] "BSgenome.Dmelanogaster.UCSC.dm2" > >>>>>>> "BSgenome.Dmelanogaster.UCSC.dm3" > >>>>>>> [13] "BSgenome.Dmelanogaster.UCSC.dm6" > >>>>>> > >>>>>> "BSgenome.Drerio.UCSC.danRer5" > >>>>>>> > >>>>>>> > >>>>>>> [15] "BSgenome.Drerio.UCSC.danRer6" > >>>>>> > >>>>>> "BSgenome.Drerio.UCSC.danRer7" > >>>>>>> > >>>>>>> > >>>>>>> [17] "BSgenome.Ecoli.NCBI.20080805" > >>>>>>> "BSgenome.Gaculeatus.UCSC.gasAcu1" > >>>>>>> [19] "BSgenome.Ggallus.UCSC.galGal3" > >>>>>> > >>>>>> "BSgenome.Ggallus.UCSC.galGal4" > >>>>>>> > >>>>>>> > >>>>>>> [21] "BSgenome.Hsapiens.NCBI.GRCh38" > >>>>>>> "BSgenome.Hsapiens.UCSC.hg17" > >>>>>>> > >>>>>>> [23] "BSgenome.Hsapiens.UCSC.hg18" > >>>>>>> "BSgenome.Hsapiens.UCSC.hg19" > >>>>>>> > >>>>>>> [25] "BSgenome.Hsapiens.UCSC.hg38" > >>>>>>> "BSgenome.Mfascicularis.NCBI.5.0" > >>>>>>> [27] "BSgenome.Mfuro.UCSC.musFur1" > >>>>>> > >>>>>> "BSgenome.Mmulatta.UCSC.rheMac2" > >>>>>>> > >>>>>>> > >>>>>>> [29] "BSgenome.Mmulatta.UCSC.rheMac3" > >>>>>> > >>>>>> "BSgenome.Mmusculus.UCSC.mm10" > >>>>>>> > >>>>>>> > >>>>>>> [31] "BSgenome.Mmusculus.UCSC.mm8" > >>>>>>> "BSgenome.Mmusculus.UCSC.mm9" > >>>>>>> > >>>>>>> [33] "BSgenome.Ptroglodytes.UCSC.panTro2" > >>>>>>> "BSgenome.Ptroglodytes.UCSC.panTro3" > >>>>>>> [35] "BSgenome.Rnorvegicus.UCSC.rn4" > >>>>>> > >>>>>> "BSgenome.Rnorvegicus.UCSC.rn5" > >>>>>>> > >>>>>>> > >>>>>>> [37] "BSgenome.Rnorvegicus.UCSC.rn6" > >>>>>>> "BSgenome.Scerevisiae.UCSC.sacCer1" > >>>>>>> [39] "BSgenome.Scerevisiae.UCSC.sacCer2" > >>>>>>> "BSgenome.Scerevisiae.UCSC.sacCer3" > >>>>>>> [41] "BSgenome.Sscrofa.UCSC.susScr3" > >>>>>> > >>>>>> "BSgenome.Tguttata.UCSC.taeGut1" > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Am I insane for suggesting this? It would make things a little > >>>>>>> easier > >>>>>> > >>>>>> for > >>>>>>> > >>>>>>> rtracklayer, most SummarizedExperiment and SE-derived objects, > blah, > >>>>>> > >>>>>> blah, > >>>>>>> > >>>>>>> blah... > >>>>>>> > >>>>>>> > >>>>>>> Best, > >>>>>>> > >>>>>>> --t > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Statistics is the grammar of science. > >>>>>>> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> > >>>>>> > >>>>>> > >>>>>> [[alternative HTML version deleted]] > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioc-devel@r-project.org mailing list > >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>> > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioc-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> > > > > -- > > Hervé Pagès > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpa...@fredhutch.org > > Phone: (206) 667-5791 > > Fax: (206) 667-1319 > > > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel