Maybe this could eventually support setting the seqinfo with: genome(gr) <- "hg19"
Or is that being too clever? On Thu, Jun 4, 2015 at 4:28 PM, Hervé Pagès <hpa...@fredhutch.org> wrote: > Hi, > > FWIW I started to work on supporting quick generation of a standalone > Seqinfo object via Seqinfo(genome="hg38") in GenomeInfoDb. > > It already supports hg38, hg19, hg18, panTro4, panTro3, panTro2, > bosTau8, bosTau7, bosTau6, canFam3, canFam2, canFam1, musFur1, mm10, > mm9, mm8, susScr3, susScr2, rn6, rheMac3, rheMac2, galGal4, galGal3, > gasAcu1, danRer7, apiMel2, dm6, dm3, ce10, ce6, ce4, ce2, sacCer3, > and sacCer2. I'll add more. > > See ?Seqinfo for some examples. > > Right now it fetches the information from internet every time you > call it but maybe we should just store that information in the > GenomeInfoDb package as Tim suggested? > > H. > > > On 06/03/2015 12:54 PM, Tim Triche, Jr. wrote: >> >> That would be perfect actually. And it would radically reduce & >> modularize maintenance. Maybe that's the best way to go after all. Quite >> sensible. >> >> --t >> >>> On Jun 3, 2015, at 12:46 PM, Vincent Carey <st...@channing.harvard.edu> >>> wrote: >>> >>> It really isn't hard to have multiple OrganismDb packages in place -- the >>> process of making new ones is documented and was given as an exercise in >>> the EdX course. I don't know if we want to institutionalize it and >>> distribute such -- I think we might, so that there would be Hs19, Hs38, >>> mm9, etc. packages. They have very little content, they just coordinate >>> interactions with packages that you'll already have. >>> >>> On Wed, Jun 3, 2015 at 3:26 PM, Tim Triche, Jr. <tim.tri...@gmail.com> >>> wrote: >>> >>>> Right, I typically do that too, and if you're working on human data it >>>> isn't a big deal. What makes things a lot more of a drag is when you >>>> work >>>> on e.g. mouse data (mm9 vs mm10, aka GRCm37 vs GRCm38) where >>>> Mus.musculus >>>> is essentially a "build ahead" of Homo.sapiens. >>>> >>>> R> seqinfo(Homo.sapiens) >>>> Seqinfo object with 93 sequences (1 circular) from hg19 genome >>>> >>>> R> seqinfo(Mus.musculus) >>>> Seqinfo object with 66 sequences (1 circular) from mm10 genome: >>>> >>>> It's not as explicit as directly assigning the seqinfo from a genome >>>> that >>>> corresponds to that of your annotations/results/whatever. I know we >>>> could >>>> all use crossmap or liftOver or whatever, but that's not really the >>>> same, >>>> and it takes time, whereas assigning the proper seqinfo for >>>> relationships >>>> is very fast. >>>> >>>> That's all I was getting at... >>>> >>>> >>>> Statistics is the grammar of science. >>>> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> >>>> >>>> On Wed, Jun 3, 2015 at 12:17 PM, Vincent Carey >>>> <st...@channing.harvard.edu >>>>> >>>>> wrote: >>>> >>>> >>>>> I typically get this info from Homo.sapiens. The result is parasitic >>>>> on >>>>> the TxDb that is in there. I don't know how easy it is to swap >>>>> alternate >>>>> TxDb in to get a different build. I think it would make sense to >>>>> regard >>>>> the OrganismDb instances as foundational for this sort of structural >>>>> data. >>>>> >>>>> On Wed, Jun 3, 2015 at 3:12 PM, Kasper Daniel Hansen < >>>>> kasperdanielhan...@gmail.com> wrote: >>>>> >>>>>> Let me rephrase this slightly. From one POV the purpose of >>>>>> GenomeInfoDb >>>>>> is >>>>>> clean up the seqinfo slot. Currently it does most of the cleaning, >>>>>> but >>>>>> it >>>>>> does not add seqlengths. >>>>>> >>>>>> It is clear that seqlengths depends on the version of the genome, but >>>>>> I >>>>>> will argue so does the seqnames. Of course, for human, chr22 will not >>>>>> change. But what about the names of all the random contigs? Or for >>>>>> other >>>>>> organisms, what about going from a draft genome with 10k contigs to a >>>>>> more >>>>>> completely genome assembled into fewer, larger chromosomes. >>>>>> >>>>>> I acknowledge that this information is present in the BSgenome >>>>>> packages, >>>>>> but it seems (to me) to be very appropriate to have them around for >>>>>> cleaning up the seqinfo slot. For some situations it is not great to >>>>>> depend on 1 GB> download for something that is a few bytes. >>>>>> >>>>>> Best, >>>>>> Kasper >>>>>> >>>>>> On Wed, Jun 3, 2015 at 3:00 PM, Tim Triche, Jr. <tim.tri...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> It would be nice (for a number of reasons) to have chromosome lengths >>>>>>> readily available in a foundational package like GenomeInfoDb, so >>>>>>> that, >>>>>>> say, >>>>>>> >>>>>>> data(seqinfo.hg19) >>>>>>> seqinfo(myResults) <- seqinfo.hg19[ seqlevels(myResults) ] >>>>>>> >>>>>>> would work without issues. Is there any particular reason this >>>>>> >>>>>> couldn't >>>>>>> >>>>>>> happen for the supported/available BSgenomes? It would seem like a >>>>>> >>>>>> simple >>>>>>> >>>>>>> matter to do >>>>>>> >>>>>>> R> library(BSgenome.Hsapiens.UCSC.hg19) >>>>>>> R> seqinfo.hg19 <- seqinfo(Hsapiens) >>>>>>> R> save(seqinfo.hg19, >>>>>>> file="~/bioc-devel/GenomeInfoDb/data/seqinfo.hg19.rda") >>>>>>> >>>>>>> and be done with it until (say) the next release or next released >>>>>>> BSgenome. I considered looping through the following BSgenomes >>>>>> >>>>>> myself... >>>>>>> >>>>>>> and if it isn't strongly opposed by (everyone) I may still do exactly >>>>>>> that. Seems useful, no? >>>>>>> >>>>>>> e.g. for the following 42 builds, >>>>>>> >>>>>>> grep("(UCSC|NCBI)", unique(gsub(".masked", "", available.genomes())), >>>>>>> value=TRUE) >>>>>>> [1] "BSgenome.Amellifera.UCSC.apiMel2" >>>>>> >>>>>> "BSgenome.Btaurus.UCSC.bosTau3" >>>>>>> >>>>>>> >>>>>>> [3] "BSgenome.Btaurus.UCSC.bosTau4" >>>>>> >>>>>> "BSgenome.Btaurus.UCSC.bosTau6" >>>>>>> >>>>>>> >>>>>>> [5] "BSgenome.Btaurus.UCSC.bosTau8" >>>>>>> "BSgenome.Celegans.UCSC.ce10" >>>>>>> >>>>>>> [7] "BSgenome.Celegans.UCSC.ce2" "BSgenome.Celegans.UCSC.ce6" >>>>>>> >>>>>>> [9] "BSgenome.Cfamiliaris.UCSC.canFam2" >>>>>>> "BSgenome.Cfamiliaris.UCSC.canFam3" >>>>>>> [11] "BSgenome.Dmelanogaster.UCSC.dm2" >>>>>>> "BSgenome.Dmelanogaster.UCSC.dm3" >>>>>>> [13] "BSgenome.Dmelanogaster.UCSC.dm6" >>>>>> >>>>>> "BSgenome.Drerio.UCSC.danRer5" >>>>>>> >>>>>>> >>>>>>> [15] "BSgenome.Drerio.UCSC.danRer6" >>>>>> >>>>>> "BSgenome.Drerio.UCSC.danRer7" >>>>>>> >>>>>>> >>>>>>> [17] "BSgenome.Ecoli.NCBI.20080805" >>>>>>> "BSgenome.Gaculeatus.UCSC.gasAcu1" >>>>>>> [19] "BSgenome.Ggallus.UCSC.galGal3" >>>>>> >>>>>> "BSgenome.Ggallus.UCSC.galGal4" >>>>>>> >>>>>>> >>>>>>> [21] "BSgenome.Hsapiens.NCBI.GRCh38" >>>>>>> "BSgenome.Hsapiens.UCSC.hg17" >>>>>>> >>>>>>> [23] "BSgenome.Hsapiens.UCSC.hg18" >>>>>>> "BSgenome.Hsapiens.UCSC.hg19" >>>>>>> >>>>>>> [25] "BSgenome.Hsapiens.UCSC.hg38" >>>>>>> "BSgenome.Mfascicularis.NCBI.5.0" >>>>>>> [27] "BSgenome.Mfuro.UCSC.musFur1" >>>>>> >>>>>> "BSgenome.Mmulatta.UCSC.rheMac2" >>>>>>> >>>>>>> >>>>>>> [29] "BSgenome.Mmulatta.UCSC.rheMac3" >>>>>> >>>>>> "BSgenome.Mmusculus.UCSC.mm10" >>>>>>> >>>>>>> >>>>>>> [31] "BSgenome.Mmusculus.UCSC.mm8" >>>>>>> "BSgenome.Mmusculus.UCSC.mm9" >>>>>>> >>>>>>> [33] "BSgenome.Ptroglodytes.UCSC.panTro2" >>>>>>> "BSgenome.Ptroglodytes.UCSC.panTro3" >>>>>>> [35] "BSgenome.Rnorvegicus.UCSC.rn4" >>>>>> >>>>>> "BSgenome.Rnorvegicus.UCSC.rn5" >>>>>>> >>>>>>> >>>>>>> [37] "BSgenome.Rnorvegicus.UCSC.rn6" >>>>>>> "BSgenome.Scerevisiae.UCSC.sacCer1" >>>>>>> [39] "BSgenome.Scerevisiae.UCSC.sacCer2" >>>>>>> "BSgenome.Scerevisiae.UCSC.sacCer3" >>>>>>> [41] "BSgenome.Sscrofa.UCSC.susScr3" >>>>>> >>>>>> "BSgenome.Tguttata.UCSC.taeGut1" >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Am I insane for suggesting this? It would make things a little >>>>>>> easier >>>>>> >>>>>> for >>>>>>> >>>>>>> rtracklayer, most SummarizedExperiment and SE-derived objects, blah, >>>>>> >>>>>> blah, >>>>>>> >>>>>>> blah... >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> --t >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Statistics is the grammar of science. >>>>>>> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioc-devel@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel