Hi Michael,

Thanks for the report. I'll look into this.

H.

On 04/22/2014 08:29 AM, Michael Stadler wrote:
Dear Herve,

We are hitting a 'memory not mapped' problem when using trimLRpatterns
as detailed below. I did not manage to reproduce it with few sequences,
so I have to refer to a publicly available sequence file with many
reads. Even then, it occasionally runs through without problems.

Also, our use-case may not be typical and be part of the problem - maybe
the solution will be to change our use of trimLRpatterns.

Here is some code to illustrate/reproduce the problem:

library(Biostrings)
library(ShortRead)

Rpat <- "TGGAATTCTCGGGTGCCAAGG"
maxRmm <- rep(0:2, c(6,3,nchar(Rpat)-9))

fq1 <- DNAStringSet(c("AAAATGGAATTCTCGGGTGCCAAGG",
                       "AAAATGGAATTCTCGGGTGCCAAGGTTTT"))

# the second read is not trimmed because it runs through the adaptor
trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
#  A DNAStringSet instance of length 2
#    width seq
#[1]     4 AAAA
#[2]    28 AAAATGGAATTCTCGGGTGCCAAGGTTT

# as a workaround, we pad the adaptor with Ns and
# increase the mismatch tolerance
numNs <- 90
maxRmm <- c(maxRmm, 1:numNs+max(maxRmm))
Rpat <- paste(c(Rpat, rep("N", numNs)), collapse="")

# now, also the second read gets trimmed
trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
#  A DNAStringSet instance of length 2
#    width seq
#[1]     4 AAAA
#[2]     4 AAAA

# to trigger the segmentation fault, many reads are needed
download.file("ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz";,
"ERR000916_1.fastq.gz")
fq2 <- readFastq("ERR000916_1.fastq.gz")

fq3 <- trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm)
# *** caught segfault ***
#address 0x7f5109be4fed, cause 'memory not mapped'
#
#Traceback:
# 1: .Call(.NAME, ..., PACKAGE = PACKAGE)
# 2: .Call2("XStringSet_vmatch_pattern_at", pattern, subject, at,
at.type, max.mismatch, min.mismatch, with.indels, fixed,     ans.type,
auto.reduce.pattern, PACKAGE = "Biostrings")
# 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch,
min.mismatch, with.indels, fixed, .to.ans.type(follow.index),
auto.reduce.pattern)
# 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
  ending.at = subject_width, max.mismatch = max.Rmismatch,
with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
# 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
  ending.at = subject_width, max.mismatch = max.Rmismatch,
with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
# 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels,
  Rfixed)
# 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject,
max.Lmismatch,     max.Rmismatch, with.Lindels, with.Rindels, Lfixed,
Rfixed,     ranges)
# 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
    max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,     ranges
= TRUE)
# 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
    max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,     ranges
= TRUE)
#10: eval(expr, envir, enclos)
#11: eval(call, sys.frame(sys.parent()))
#12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch,
max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed,     ranges =
TRUE)
#13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
maxRmm,     with.Rindels = FALSE)
#14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
maxRmm,     with.Rindels = FALSE)

The problem did not occur in R 3.0.3 with BioC 2.13.
Do you have an idea what's wrong?

Thanks for your help,
Michael



sessionInfo()R version 3.1.0 (2014-04-10)
#Platform: x86_64-unknown-linux-gnu (64-bit)
#
#locale:
#[1] C
#
#attached base packages:
#[1] parallel  stats     graphics  grDevices utils     datasets  methods
#[8] base
#
#other attached packages:
# [1] ShortRead_1.22.0        GenomicAlignments_1.0.0 BSgenome_1.32.0

# [4] Rsamtools_1.16.0        GenomicRanges_1.16.0    GenomeInfoDb_1.0.1

# [7] BiocParallel_0.6.0      Biostrings_2.32.0       XVector_0.4.0

#[10] IRanges_1.22.2          BiocGenerics_0.10.0     RColorBrewer_1.0-5

#
#loaded via a namespace (and not attached):
# [1] BBmisc_1.5          BatchJobs_1.2       Biobase_2.24.0
# [4] DBI_0.2-7           RSQLite_0.11.4      Rcpp_0.11.1
# [7] bitops_1.0-6        brew_1.0-6          codetools_0.2-8
#[10] compiler_3.1.0      digest_0.6.4        fail_1.2
#[13] foreach_1.4.2       grid_3.1.0          hwriter_1.3
#[16] iterators_1.0.7     lattice_0.20-29     latticeExtra_0.6-26
#[19] plyr_1.8.1          sendmailR_1.1-2     stats4_3.1.0
#[22] stringr_0.6.2       tools_3.1.0         zlibbioc_1.10.0

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to