I've pushed new 3.8.2 orgdbs that should propagate soon. They do not have this issue.
________________________________ From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Pages, Herve <hpa...@fredhutch.org> Sent: Thursday, April 25, 2019 9:19:35 PM To: Aaron Lun; Vincent Carey Cc: Bioc-devel; jmac...@u.washington.edu Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db Hi Aaron, On 4/25/19 16:44, Aaron Lun wrote: It doesn't seem like it - on my installation, org.Hs.eg.db is still... monkeying around. __ w c(..)o ( \__(-) __) /\ ( /(_)___) w /| | \ m m Daniel has prepared a new batch of *.db0 and org.* packages (v 3.8.1). The new packages are on their way and should become available via BiocManager::install() in the next 12 hours or so. Hopefully they'll put an end to the Great Monkey Conspiracy! Unfortunately we won't see the effect on tomorrow's build report, only on Saturday's report. Cheers, H. On Thu, Apr 25, 2019 at 9:17 AM Vincent Carey <st...@channing.harvard.edu><mailto:st...@channing.harvard.edu> wrote: Has this situation been rectified? On Tue, Apr 23, 2019 at 11:40 AM Van Twisk, Daniel < daniel.vantw...@roswellpark.org<mailto:daniel.vantw...@roswellpark.org>> wrote: We've made some changes to our annotation generation scripts this release and it seems these may have introduced some errors. Thank you for identifying this issue and I will try to have some fixes out asap. ________________________________ From: Bioc-devel <bioc-devel-boun...@r-project.org><mailto:bioc-devel-boun...@r-project.org> on behalf of James W. MacDonald <jmac...@uw.edu><mailto:jmac...@uw.edu> Sent: Tuesday, April 23, 2019 11:03:02 AM To: Aaron Lun Cc: Bioc-devel Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db Looks like the ensembl table of the human.db0 package got polluted with *Pan troglodytes* genes: con <- dbConnect(SQLite(), "/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite") dbGetQuery(con, "select count(*) from ensembl where ensid like 'ENSPTR%';") count(*) 1 16207 dbGetQuery(con, "select count(*) from ensembl where ensid like 'ENSG%';") count(*) 1 28973 On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun < infinite.monkeys.with.keyboa...@gmail.com<mailto:infinite.monkeys.with.keyboa...@gmail.com>> wrote: Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG0000...? > library(org.Hs.eg.db) > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL") 'select()' returned 1:many mapping between keys and columns GCG "ENSPTRG00000000777" Well, at least it still recovers the right identifier... eventually. > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL") 'select()' returned 1:many mapping between keys and columns SYMBOL ENSEMBL 1 GCG ENSPTRG00000000777 2 GCG ENSG00000115263 The SYMBOL->Entrez ID relational table seems to be okay: > Y <- toTable(org.Hs.egSYMBOL) > Y[which(Y[,2]=="GCG"),] gene_id symbol 2152 2641 GCG So the cause is the Ensembl->Entrez mappings: > Z <- toTable(org.Hs.egENSEMBL2EG) > Z[Z[,1]==2641,] gene_id ensembl_id 3028 2641 ENSPTRG00000000777 3029 2641 ENSG00000115263 Googling suggests that ENSPTRG00000000777 is an identifier for some other gene in one of the other monkeys. Hardly "Hs" stuff. Session info (not technically R 3.6, but I didn't think that would have been the cause): R Under development (unstable) (2019-04-11 r76379) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS Matrix products: default BLAS: /home/luna/Software/R/trunk/lib/libRblas.so LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] org.Hs.eg.db_3.8.0 AnnotationDbi_1.45.1 IRanges_2.17.5 [4] S4Vectors_0.21.23 Biobase_2.43.1 BiocGenerics_0.29.2 loaded via a namespace (and not attached): [1] Rcpp_1.0.1 digest_0.6.18 DBI_1.0.0 RSQLite_2.1.1 [5] blob_1.1.1 bit64_0.9-7 bit_1.1-14 compiler_3.7.0 [9] pkgconfig_2.0.2 memoise_1.1.0 _______________________________________________ Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pRzAhoukTjoi6JCrxpZEHER0Dj7wqeCghzULGLFaTNQ&s=MxM9vCqiDsqvIw8l3iyam0_WN-7LHwlr6YiG_zb4vkQ&e= -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pRzAhoukTjoi6JCrxpZEHER0Dj7wqeCghzULGLFaTNQ&s=MxM9vCqiDsqvIw8l3iyam0_WN-7LHwlr6YiG_zb4vkQ&e= This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pRzAhoukTjoi6JCrxpZEHER0Dj7wqeCghzULGLFaTNQ&s=MxM9vCqiDsqvIw8l3iyam0_WN-7LHwlr6YiG_zb4vkQ&e= The information in this e-mail is intended only for th...{{dropped:15}} _______________________________________________ Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pRzAhoukTjoi6JCrxpZEHER0Dj7wqeCghzULGLFaTNQ&s=MxM9vCqiDsqvIw8l3iyam0_WN-7LHwlr6YiG_zb4vkQ&e= -- Herv� Pag�s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org<mailto:hpa...@fredhutch.org> Phone: (206) 667-5791 Fax: (206) 667-1319 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel