Hi Jim,

Hi Jim,

Awesome, that makes sense now. I was wondering whether org.Dr.eg.db has only 
functional annotation, which I thought it was as it did not refer to a specific 
genome, unlike TxDb packages, but then I found what I said in my previous 
emails.

Thank you very much,
Gennady

From: "James W. MacDonald" <jmac...@uw.edu>
Reply-To: "jmac...@u.washington.edu" <jmac...@u.washington.edu>
Date: Thursday, August 13, 2020 at 5:41 PM
To: "Margolin, Gennady (NIH/NICHD) [C]" <gennady.margo...@nih.gov>
Cc: Vincent Carey <st...@channing.harvard.edu>, "bioc-devel@r-project.org" 
<bioc-devel@r-project.org>
Subject: Re: [Bioc-devel] Question about org.Dr.eg.db package

Hi Gennady,

That information should probably be cleaned up, and the BiMaps that point to 
the location data removed. While the OrgDbs do contain position information, 
it's been deprecated, which you would find if you tried to query using select():

> select(org.Dr.eg.db, "30037", "CHR")
'select()' returned 1:1 mapping between keys and columns
  ENTREZID CHR
1    30037   5
Warning message:
In .deprecatedColsMessage() :
  Accessing gene location information via 'CHR','CHRLOC','CHRLOCEND' is
  deprecated. Please use a range based accessor like genes(), or select()
  with columns values like TXCHROM and TXSTART on a TxDb or OrganismDb
  object instead.

The rationale being that the OrgDb packages are intended to contain functional 
annotations, which are not based on any build, and instead are current as of 
the construction of the OrgDb package. Since positional information should be 
based on a genome release, those data have been migrated to the TxDb and EnsDb 
packages, which are based on a given release.

Put a different way, the data in an OrgDb package is downloaded from NCBI as of 
a particular date, and the positional data we get are whatever we got from NCBI 
on that date. This is obviously a problem for the positional data, because what 
we get isn't necessarily build-specific. We get the TxDb data from the UCSC 
Genome Browser, which is build specific, so we can tell end users exactly what 
build the data come from. Ideally these data would be defunct in the OrgDb 
packages, but it hasn't happened yet.

Best,

Jim



On Thu, Aug 13, 2020 at 4:39 PM Margolin, Gennady (NIH/NICHD) [C] via 
Bioc-devel <bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>> wrote:
Hi Vincent,

Thank you for responding.

Here is from the R documentation help page from this package (I have version 
3.10.0 (I doubt anything changed with the latest one, which is 3.11.4)):

-------------------------------------------------
org.Dr.egCHRLOC {org.Dr.eg.db}
Entrez Gene IDs to Chromosomal Location
Description
org.Dr.egCHRLOC is an R object that maps entrez gene identifiers to the 
starting position of the gene. The position of a gene is measured as the number 
of base pairs.
The CHRLOCEND mapping is the same as the CHRLOC mapping except that it 
specifies the ending base of a gene instead of the start.
……
-------------------------------------------------

This output also does not show any genome version:
> org.Dr.eg_dbInfo()
                 name                                                           
  value
1     DBSCHEMAVERSION                                                           
    2.1
2             Db type                                                           
  OrgDb
3  Supporting package                                                     
AnnotationDbi
4            DBSCHEMA                                                      
ZEBRAFISH_DB
5            ORGANISM                                                       
Danio rerio
6             SPECIES                                                         
Zebrafish
7        EGSOURCEDATE                                                        
2019-Jul10
8        EGSOURCENAME                                                       
Entrez Gene
9         EGSOURCEURL                              
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
10          CENTRALID                                                           
     EG
11              TAXID                                                           
   7955
12       GOSOURCENAME                                                     Gene 
Ontology
13        GOSOURCEURL 
ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
14       GOSOURCEDATE                                                        
2019-Jul10
15     GOEGSOURCEDATE                                                        
2019-Jul10
16     GOEGSOURCENAME                                                       
Entrez Gene
17      GOEGSOURCEURL                              
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
18     KEGGSOURCENAME                                                       
KEGG GENOME
19      KEGGSOURCEURL                              
ftp://ftp.genome.jp/pub/kegg/genomes
20     KEGGSOURCEDATE                                                        
2011-Mar15
21       GPSOURCENAME                          UCSC Genome Bioinformatics 
(Danio rerio)
22        GPSOURCEURL
23       GPSOURCEDATE                                                         
2017-Nov1
24       ENSOURCEDATE                                                        
2019-Jun24
25       ENSOURCENAME                                                           
Ensembl
26        ENSOURCEURL                           
ftp://ftp.ensembl.org/pub/current_fasta
27       UPSOURCENAME                                                           
Uniprot
28        UPSOURCEURL                                           
http://www.UniProt.org/
29       UPSOURCEDATE                                          Mon Oct 21 
14:32:30 2019

From: Vincent Carey 
<st...@channing.harvard.edu<mailto:st...@channing.harvard.edu>>
Date: Thursday, August 13, 2020 at 2:46 PM
To: "Margolin, Gennady (NIH/NICHD) [C]" 
<gennady.margo...@nih.gov<mailto:gennady.margo...@nih.gov>>
Cc: "bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>" 
<bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>>
Subject: Re: [Bioc-devel] Question about org.Dr.eg.db package

This should probably be posed to the support site.  What version of the package 
are you using?  Where
are you seeing coordinates?  I would expect those to be obtained from the TxDb 
package, or perhaps
from AnnotationHub.


> columns(org.Dr.eg.db)

 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"

 [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"

[11] "GO"           "GOALL"        "IPI"          "ONTOLOGY"     "ONTOLOGYALL"

[16] "PATH"         "PFAM"         "PMID"         "PROSITE"      "REFSEQ"

[21] "SYMBOL"       "UNIGENE"      "UNIPROT"      "ZFIN"


On Thu, Aug 13, 2020 at 2:13 PM Margolin, Gennady (NIH/NICHD) [C] via 
Bioc-devel 
<bioc-devel@r-project.org<mailto:bioc-devel@r-project.org><mailto:bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>>>
 wrote:
Hello,

I have a short question – how do I figure the genome version for org.Dr.eg.db 
package? I couldn’t see it in the DESCRIPTION and also it’s not in 
org.Dr.eg_dbInfo() output. It would be nice to know if this is danRer11/GRCz11 
or some other assembly, as there are coordinates present in the DB.

Thank you,
Gennady

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org><mailto:Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org>>
 mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to