Ah, ok, sorry. I didn't get that this was also fixed at the same time. Thanks!!!! robert.
-------- Original message -------- From: Valerie Obenchain <voben...@fredhutch.org> Date:17/03/2015 19:59 (GMT+01:00) To: Robert Castelo <robert.cast...@upf.edu>,bioc-devel@r-project.org Subject: Re: [Bioc-devel] little tiny bug in CDSID annotations from predictCoding() The second mapping in .localCoordinates() in AllUtils.R was using a listed 'to' instead of unlisted 'to'. This means the mapping was at the level of the outer list elements vs the inner. We want to map/overlap with an unlisted object because each range can have a different CDSID. This second mapping provided the index for retrieving the CDSIDs which is why you saw a difference between the two. This was a bug for predictCoding only. Valerie On 03/17/2015 11:39 AM, Robert Castelo wrote: > Thanks! Do you have an explanation for the apparent disagreement in > CDSID annotations that i described below the bug, between > predictCoding() and locateVariants()? > > robert. > > > -------- Original message -------- > From: Valerie Obenchain > Date:17/03/2015 19:18 (GMT+01:00) > To: bioc-devel@r-project.org > Subject: Re: [Bioc-devel] little tiny bug in CDSID annotations from > predictCoding() > > Hi Robert, > > Thanks for reporting the typo and bug. Now fixed in 1.13.41. > > Valerie > > On 03/17/2015 10:58 AM, Robert Castelo wrote: > > in my message below, the line that it says: > > > > head(loc_all$CDSID) > > > > it should say > > > > head(coding2$CDSID) > > > > cheers, > > > > robert. > > > > ========================================== > > hi, > > > > there is a little tiny bug in the current devel version of > > VariantAnnotation::predictCoding(), and more concretely within > > VariantAnnotation:::.localCoordinates(), that precludes the correct > > annotation of the CDSID column: > > > > library(VariantAnnotation) > > library(TxDb.Hsapiens.UCSC.hg19.knownGene) > > library(BSgenome.Hsapiens.UCSC.hg19) > > > > vcf <- readVcf(system.file("extdata", "CEUtrio.vcf.bgz", > > package="VariantFiltering"), genome="hg19") > > seqlevelsStyle(vcf) <- seqlevelsStyle(txdb) > > vcf <- dropSeqlevels(vcf, "chrM") > > coding1 <- predictCoding(vcf, txdb, Hsapiens) > > head(coding1$CDSID) > > IntegerList of length 6 > > [[1]] integer(0) > > [[2]] integer(0) > > [[3]] integer(0) > > [[4]] integer(0) > > [[5]] integer(0) > > [[6]] integer(0) > > table(elementLengths(coding1$CDSID)) > > > > 0 > > 6038 > > > > my sessionInfo() is at the end of the message. > > > > here is the patch, just replacing 'map2$trancriptHits' by > > 'map2$transcriptsHits': > > > > --- R/AllUtilities.R (revision 100756) > > +++ R/AllUtilities.R (working copy) > > @@ -284,7 +284,7 @@ > > cdsid <- IntegerList(integer(0)) > > map2 <- mapToTranscripts(unname(from)[xHits], to, > > ignore.strand=ignore.strand) > > - cds <- mcols(unlist(to, > use.names=FALSE))$cds_id[map2$trancriptsHits] > > + cds <- mcols(unlist(to, > use.names=FALSE))$cds_id[map2$transcriptsHits] > > if (length(cds)) { > > cdslst <- unique(splitAsList(cds, map2$xHits)) > > cdsid <- cdslst > > > > with this fix then things seem to work again: > > > > coding1 <- predictCoding(vcf, txdb, Hsapiens) > >> head(coding1$CDSID) > > IntegerList of length 6 > > [["1"]] 21771 > > [["2"]] 21771 > > [["3"]] 21771 > > [["4"]] 21771 > > [["5"]] 21428 > > [["6"]] 21428 > > table(elementLengths(coding1$CDSID)) > > > > 1 2 3 4 5 6 7 8 9 10 12 13 14 > 16 19 > > 873 1229 1024 993 615 524 324 168 82 21 12 15 42 > 76 40 > > > > while investigating this bug i used VariantAnnotation::locateVariants() > > which also annotates the CDSID column, and it seemed to be working. > > however, i noticed that both, predictCoding() and locateVariants(), do > > not give an identical annotation for the CDSID column in coding variants: > > > > coding2 <- locateVariants(vcf, txdb, CodingVariants()) > > head(loc_all$CDSID) > > IntegerList of length 6 > > [["1"]] 210777 > > [["2"]] 210777 > > [["3"]] 210777 > > [["4"]] 210778 > > [["5"]] 208140 > > [["6"]] 208141 > > table(elementLengths(coding2$CDSID)) > > > > 1 2 3 4 > > 4987 901 138 12 > > > > in principle, it seems that both are annotating valid CDSID keys: > > > > allcdsinfo <- select(txdb, keys=keys(txdb, keytype="CDSID"), > > columns="CDSID", keytype="CDSID") > > sum(!as.character(unlist(coding1$CDSID, use.names=FALSE)) %in% > > allcdsinfo$CDSID) > > [1] 0 > > sum(!as.character(unlist(coding2$CDSID, use.names=FALSE)) %in% > > allcdsinfo$CDSID) > > [1] 0 > > > > but predictCoding() annotates CDSID values that are not present in > > locateVariants() annotations and viceversa: > > > > sum(!as.character(unlist(coding1$CDSID, use.names=FALSE)) %in% > > as.character(unlist(coding2$CDSID, use.names=FALSE))) > > [1] 24057 > > sum(!as.character(unlist(coding2$CDSID, use.names=FALSE)) %in% > > as.character(unlist(coding1$CDSID, use.names=FALSE))) > > [1] 7251 > > > > length(unique(intersect(as.character(unlist(coding2$CDSID, > > use.names=FALSE)), as.character(unlist(coding1$CDSID, > use.names=FALSE))))) > > [1] 0 > > > > should not both annotate the same CDSID values on coding variants? > > > > > > thanks! > > robert. > > ps: sessionInfo() > > R Under development (unstable) (2014-10-14 r66765) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8 > > [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 > > [7] LC_PAPER=en_US.UTF8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats4 parallel stats graphics grDevices > > [6] utils datasets methods base > > > > other attached packages: > > [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.1 > > [2] GenomicFeatures_1.19.31 > > [3] AnnotationDbi_1.29.17 > > [4] Biobase_2.27.2 > > [5] BSgenome.Hsapiens.UCSC.hg19_1.4.0 > > [6] BSgenome_1.35.17 > > [7] rtracklayer_1.27.8 > > [8] VariantAnnotation_1.13.40 > > [9] Rsamtools_1.19.44 > > [10] Biostrings_2.35.11 > > [11] XVector_0.7.4 > > [12] GenomicRanges_1.19.46 > > [13] GenomeInfoDb_1.3.13 > > [14] IRanges_2.1.43 > > [15] S4Vectors_0.5.22 > > [16] BiocGenerics_0.13.6 > > [17] vimcom_1.0-0 > > [18] setwidth_1.0-3 > > [19] colorout_1.0-3 > > > > loaded via a namespace (and not attached): > > [1] BiocParallel_1.1.15 biomaRt_2.23.5 > > [3] bitops_1.0-6 DBI_0.3.1 > > [5] GenomicAlignments_1.3.31 RCurl_1.95-4.5 > > [7] RSQLite_1.0.0 tools_3.2.0 > > [9] XML_3.98-1.1 zlibbioc_1.13.2 > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: voben...@fredhutch.org Phone: (206) 667-3158 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel