Glad to help! On Thu, Aug 13, 2020 at 5:51 PM Margolin, Gennady (NIH/NICHD) [C] < gennady.margo...@nih.gov> wrote:
> Hi Jim, > > > > Hi Jim, > > > > Awesome, that makes sense now. I was wondering whether org.Dr.eg.db has > only functional annotation, which I thought it was as it did not refer to a > specific genome, unlike TxDb packages, but then I found what I said in my > previous emails. > > > > Thank you very much, > > Gennady > > > > *From: *"James W. MacDonald" <jmac...@uw.edu> > *Reply-To: *"jmac...@u.washington.edu" <jmac...@u.washington.edu> > *Date: *Thursday, August 13, 2020 at 5:41 PM > *To: *"Margolin, Gennady (NIH/NICHD) [C]" <gennady.margo...@nih.gov> > *Cc: *Vincent Carey <st...@channing.harvard.edu>, " > bioc-devel@r-project.org" <bioc-devel@r-project.org> > *Subject: *Re: [Bioc-devel] Question about org.Dr.eg.db package > > > > Hi Gennady, > > > > That information should probably be cleaned up, and the BiMaps that point > to the location data removed. While the OrgDbs do contain position > information, it's been deprecated, which you would find if you tried to > query using select(): > > > > > select(org.Dr.eg.db, "30037", "CHR") > 'select()' returned 1:1 mapping between keys and columns > ENTREZID CHR > 1 30037 5 > Warning message: > In .deprecatedColsMessage() : > Accessing gene location information via 'CHR','CHRLOC','CHRLOCEND' is > deprecated. Please use a range based accessor like genes(), or select() > with columns values like TXCHROM and TXSTART on a TxDb or OrganismDb > object instead. > > > > The rationale being that the OrgDb packages are intended to contain > functional annotations, which are not based on any build, and instead are > current as of the construction of the OrgDb package. Since positional > information should be based on a genome release, those data have been > migrated to the TxDb and EnsDb packages, which are based on a given release. > > > > Put a different way, the data in an OrgDb package is downloaded from NCBI > as of a particular date, and the positional data we get are whatever we got > from NCBI on that date. This is obviously a problem for the positional > data, because what we get isn't necessarily build-specific. We get the TxDb > data from the UCSC Genome Browser, which is build specific, so we can tell > end users exactly what build the data come from. Ideally these data would > be defunct in the OrgDb packages, but it hasn't happened yet. > > > > Best, > > > > Jim > > > > > > > > On Thu, Aug 13, 2020 at 4:39 PM Margolin, Gennady (NIH/NICHD) [C] via > Bioc-devel <bioc-devel@r-project.org> wrote: > > Hi Vincent, > > Thank you for responding. > > Here is from the R documentation help page from this package (I have > version 3.10.0 (I doubt anything changed with the latest one, which is > 3.11.4)): > > ------------------------------------------------- > org.Dr.egCHRLOC {org.Dr.eg.db} > Entrez Gene IDs to Chromosomal Location > Description > org.Dr.egCHRLOC is an R object that maps entrez gene identifiers to the > starting position of the gene. The position of a gene is measured as the > number of base pairs. > The CHRLOCEND mapping is the same as the CHRLOC mapping except that it > specifies the ending base of a gene instead of the start. > …… > ------------------------------------------------- > > This output also does not show any genome version: > > org.Dr.eg_dbInfo() > name > value > 1 DBSCHEMAVERSION > 2.1 > 2 Db type > OrgDb > 3 Supporting package > AnnotationDbi > 4 DBSCHEMA > ZEBRAFISH_DB > 5 ORGANISM > Danio rerio > 6 SPECIES > Zebrafish > 7 EGSOURCEDATE > 2019-Jul10 > 8 EGSOURCENAME > Entrez Gene > 9 EGSOURCEURL > ftp://ftp.ncbi.nlm.nih.gov/gene/DATA > 10 CENTRALID > EG > 11 TAXID > 7955 > 12 GOSOURCENAME > Gene Ontology > 13 GOSOURCEURL > ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/ > 14 GOSOURCEDATE > 2019-Jul10 > 15 GOEGSOURCEDATE > 2019-Jul10 > 16 GOEGSOURCENAME > Entrez Gene > 17 GOEGSOURCEURL > ftp://ftp.ncbi.nlm.nih.gov/gene/DATA > 18 KEGGSOURCENAME > KEGG GENOME > 19 KEGGSOURCEURL > ftp://ftp.genome.jp/pub/kegg/genomes > 20 KEGGSOURCEDATE > 2011-Mar15 > 21 GPSOURCENAME UCSC Genome Bioinformatics > (Danio rerio) > 22 GPSOURCEURL > 23 GPSOURCEDATE > 2017-Nov1 > 24 ENSOURCEDATE > 2019-Jun24 > 25 ENSOURCENAME > Ensembl > 26 ENSOURCEURL > ftp://ftp.ensembl.org/pub/current_fasta > 27 UPSOURCENAME > Uniprot > 28 UPSOURCEURL > http://www.UniProt.org/ > 29 UPSOURCEDATE Mon Oct 21 > 14:32:30 2019 > > From: Vincent Carey <st...@channing.harvard.edu> > Date: Thursday, August 13, 2020 at 2:46 PM > To: "Margolin, Gennady (NIH/NICHD) [C]" <gennady.margo...@nih.gov> > Cc: "bioc-devel@r-project.org" <bioc-devel@r-project.org> > Subject: Re: [Bioc-devel] Question about org.Dr.eg.db package > > This should probably be posed to the support site. What version of the > package are you using? Where > are you seeing coordinates? I would expect those to be obtained from the > TxDb package, or perhaps > from AnnotationHub. > > > > columns(org.Dr.eg.db) > > [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" > "ENSEMBLTRANS" > > [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME" > > [11] "GO" "GOALL" "IPI" "ONTOLOGY" > "ONTOLOGYALL" > > [16] "PATH" "PFAM" "PMID" "PROSITE" "REFSEQ" > > [21] "SYMBOL" "UNIGENE" "UNIPROT" "ZFIN" > > > On Thu, Aug 13, 2020 at 2:13 PM Margolin, Gennady (NIH/NICHD) [C] via > Bioc-devel <bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>> > wrote: > Hello, > > I have a short question – how do I figure the genome version for > org.Dr.eg.db package? I couldn’t see it in the DESCRIPTION and also it’s > not in org.Dr.eg_dbInfo() output. It would be nice to know if this is > danRer11/GRCz11 or some other assembly, as there are coordinates present in > the DB. > > Thank you, > Gennady > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > The information in this e-mail is intended only for the person to whom it > is > addressed. If you believe this e-mail was sent to you in error and the > e-mail > contains patient information, please contact the Partners Compliance > HelpLine at > http://www.partners.org/complianceline . If the e-mail was sent to you in > error > but does not contain patient information, please contact the sender and > properly > dispose of the e-mail. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > -- > > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel