Re: [Bioc-sig-seq] trimLRPatterns; matching pattern from pos -1 gives solveUserSEW error

Harris A. Jaffee Wed, 16 Feb 2011 14:47:34 -0800

Sorry, I retract my proposed fix.  It would break the opposite case:
        
        trimLRPatterns(subject="A", Lpattern="AC", max.Lmismatch=1)

I lean toward trusting the comments on .computeTrimStartand .computeTrimEnd,

meaning that the code should adhere to them -- perhaps as follows.

The last line of .computeTrimStart becomes

        pmin(pattern_length + 2L - ii, width(subject) + 1L)

and the last line of .computeTrimEnd becomes

        pmax(subject_width - pattern_length - 1L + ii, 0L)

No doubt I've still missed something.

On Feb 16, 2011, at 3:22 PM, Harris A. Jaffee wrote:

This will be a whole lot of detail in lieu of a final fix (proposalbelow).
I don't believe the authors, certainly not I, envisioned yoursituation, which I think I can
abstract correctly here:

        library(Biostrings)
trimLRPatterns(Rpattern="CA", subject="A", max.Rmismatch=1) # sameas max.Rmismatch=c(-1,1)
To have loaded ShortRead was irrelevant and perhaps confusing.

Now, what should happen with the call above, from your point of view?
The first test is for the whole Rpattern "ending at the right endof the subject", thus extendingoff the left end of the subject by 1 letter as you say. For amatch, we need to allow for at least1 error, and that is sufficient because the rest of the Rpattern isthe same as the subject. Notethat this 1 goes in the *last* element of the max.Rmismatchvector. So, this test should succeed,
the entire subject should get trimmed, and your result should be "".

To gain insight, you have

        showMethods("trimLRPatterns", includeDefs=TRUE)

That might lead you to make your trimLRPatterns call under

        debug(Biostrings:::.XStringSet.trimLRPatterns)

The abort
Error in solveUserSEW(width(x), start = start, end = end, width =width) :solving row 1: 'allow.nonnarrowing' is FALSE and the suppliedstart (0) is < 1
clearly occurs in the call

        narrow(subject, start=0, end=-1)

at the end of .XStringSet.trimLRPatterns, but who is at fault?
Well, Biostrings:::.computeTrimEnd() returned the -1, violatingthis comment:
        ### 'subject' must be an XStringSet object of length != 0.
### Returns an integer vector of the same length as 'subject'where the i-th
        ### value is guaranteed to be >= 0 and <= width(subject)[i].
I think -1 is correct, and the comment is false [see my claim aboutthe authors].
Then, just before the call to narrow(), we have thisin .XStringSet.trimLRPatterns:
## For those invalid ranges where 'start > end + 1L', wearbitrarily## decide to set the 'start' to 'end + 1' (another reasonablechoice
    ## would have been to set the 'end' to 'start - 1').
With end=-1, start=1 (from .computeTrimStart, correctly) getschanged to end+1 = 0.
In this situation, the other choice would have fixed your problem.I don't think it
would create any other problems.

-Harris

On Feb 16, 2011, at 8:12 AM, Ludo Pagie wrote:
Hi all,
I'm trimming reads using an Rpattern which in some cases extendson the left side. If the pattern starts at exactly position -1trimLRPattern throws an error. If the pattern starts further'upstream' it returns a zero length sequence, as expected:
library(ShortRead)
# create a pattern
fragment <- paste(sample(c('A','C','G','T'),10,replace=TRUE),collapse='')
# create some reads based on the pattern; for different reads the
# pattern extends either on the left, the right, or both sides
reads <- substring(fragment,1:4,7:10)

# trim all reads; all reads should match the pattern fully and be
# trimmed from start to end
trimLRPatterns(Rpattern=fragment, subject=DNAStringSet(reads),max.Rmismatch=0.5, ranges=TRUE, Rfixed=FALSE)
# IRanges of length 4
#     start end width
# [1]     1   0     0
# [2]     0  -1     0
# [3]    -1  -2     0
# [4]    -2  -3     0

# if I want the reads to be trimmed right away an error is thrown for
# the second read
trimLRPatterns(Rpattern=fragment, subject=DNAStringSet(reads[c(1,3,4)]), max.Rmismatch=0.5, ranges=FALSE)
#   A DNAStringSet instance of length 3
#     width seq
# [1]     0
# [2]     0
# [3]     0
trimLRPatterns(Rpattern=fragment, subject=DNAStringSet(reads[2]),max.Rmismatch=0.5, ranges=FALSE)
# Error in solveUserSEW(width(x), start = start, end = end, width =
# width) :
# solving row 1: 'allow.nonnarrowing' is FALSE and the suppliedstart
# (0) is < 1


> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] ShortRead_1.8.2     Rsamtools_1.2.1     lattice_0.19-13
[4] Biostrings_2.18.2   GenomicRanges_1.2.2 IRanges_1.8.7
[7] multtest_2.6.0      Biobase_2.10.0

loaded via a namespace (and not attached):
[1] grid_2.12.0     hwriter_1.3     MASS_7.3-9      splines_2.12.0
[5] survival_2.36-2 tools_2.12.0



Where is this error coming from?

Ludo

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] trimLRPatterns; matching pattern from pos -1 gives solveUserSEW error

Reply via email to