Hi everyone,

I am using goseq to perform Gene Ontology analysis of RNA-seq data and I think I found the following bug.

Since my RNA-seq data was aligned using Refseq table - so I need to supply id as "refGene" which are actually EntrezId's from goseq::supportedGeneIDs()

The bug is - that the internal function(getgo) is expecting refseq identifiers and the external function (goseq) is expecting entrez id's..

When I use refGene as id ( which is actually an entrez gene ids). I get the following error -

> pwf = nullp(genes, "hg19", "refGene")
Loading hg19 length data...
Warning message:
In pcls(G) : initial point very close to some inequality constraints
> GO.wall = goseq(pwf, "hg19", "refGene")
Fetching GO annotations...
Error in names(tmp) = rep(names(map), times = as.numeric(summary(map)[,  :
  attempt to set an attribute on NULL


After some debugging, I find that internal function getgo() returns a vector of NULL's using "Entrez Gene Id" as id.

> go = getgo(names(genes), "hg19", "refGene")
> head(go)
$<NA>
NULL

$<NA>
NULL

$<NA>
NULL

$<NA>
NULL

$<NA>
NULL

$<NA>
NULL

The last line of the function goseq::getgo() seems to be the issue..

Browse[2]> head(names(user2cat))
[1] "NM_000014" "NM_000015" "NM_000016" "NM_000017" "NM_000018" "NM_000019"
Browse[2]> head(genes)
[1] "1"         "100"       "1000"      "10000"     "10001" "100037417"
Browse[2]> head(user2cat[toupper(genes)])
$<NA>
NULL

$<NA>
NULL

$<NA>
NULL

$<NA>
NULL

$<NA>
NULL

$<NA>
NULL

If I give the getgo() function refseq identifiers instead of entrez id, then the internal function getgo works - but the nullp function fails as it is still expecting entrez gene id.

> test_genes <- c(0, 0,1,0,0,0)
> names(test_genes) =c("NM_000014", "NM_000015", "NM_000016", "NM_000017",
+                      "NM_000018", "NM_000019")
> go = getgo(names(test_genes), "hg19", "refGene")
> names(go)
[1] "NM_000014" "NM_000015" "NM_000016" "NM_000017" "NM_000018" "NM_000019"
> pwf = nullp(test_genes, "hg19", "refGene")
Loading hg19 length data...
Error in getlength(names(DEgenes), genome, id) :
The gene names specified do not match the gene names for genome hg19 and ID refGene. Gene names given were: NM_000014, NM_000015, NM_000016, NM_000017, NM_000018, NM_000019, NA, NA, NA, NA Required gene names are: 84251, 113451, 25932, 55707, 55707, 84871, 50651, 55707, 7049, 1629

A reproducible example -

# case - 1
test_genes <- c(0, 0,1,0,0,0)
names(test_genes) <- c(1,100,1000, 10000, 10001, 100037417)
go = getgo(names(test_genes), "hg19", "refGene")

# case -2
names(test_genes) =c("NM_000014", "NM_000015", "NM_000016", "NM_000017",
                     "NM_000018", "NM_000019")
go = getgo(names(test_genes), "hg19", "refGene")
pwf = nullp(test_genes, "hg19", "refGene")


> sessionInfo()
R version 3.2.2 RC (2015-08-09 r68965)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] org.Hs.eg.db_3.2.2 AnnotationDbi_1.31.19 IRanges_2.3.24 S4Vectors_0.7.22 Biobase_2.29.1 BiocGenerics_0.15.10 goseq_1.21.1 RSQLite_1.0.0
 [9] DBI_0.3.1             geneLenDataBase_1.5.0 BiasedUrn_1.06.1

loaded via a namespace (and not attached):
[1] XVector_0.9.4 GenomicRanges_1.21.30 zlibbioc_1.15.0 GenomicAlignments_1.5.18 BiocParallel_1.3.54 lattice_0.20-33 [7] GenomeInfoDb_1.5.16 tools_3.2.2 grid_3.2.2 SummarizedExperiment_0.3.9 nlme_3.1-122 mgcv_1.8-7 [13] lambda.r_1.1.7 futile.logger_1.4.1 Matrix_1.2-2 rtracklayer_1.29.28 futile.options_1.0.0 bitops_1.0-6 [19] RCurl_1.95-4.7 biomaRt_2.25.3 GO.db_3.2.2 GenomicFeatures_1.21.31 Biostrings_2.37.8 Rsamtools_1.21.20
[25] XML_3.98-1.3

Please advise.

Thanks and Regards,
Sonali.

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to