Sorry, I retract my proposed fix.  It would break the opposite case:
        
        trimLRPatterns(subject="A", Lpattern="AC", max.Lmismatch=1)

I lean toward trusting the comments on .computeTrimStart and .computeTrimEnd,
meaning that the code should adhere to them -- perhaps as follows.

The last line of .computeTrimStart becomes

        pmin(pattern_length + 2L - ii, width(subject) + 1L)

and the last line of .computeTrimEnd becomes

        pmax(subject_width - pattern_length - 1L + ii, 0L)

No doubt I've still missed something.

On Feb 16, 2011, at 3:22 PM, Harris A. Jaffee wrote:
This will be a whole lot of detail in lieu of a final fix (proposal below).

I don't believe the authors, certainly not I, envisioned your situation, which I think I can
abstract correctly here:

        library(Biostrings)
trimLRPatterns(Rpattern="CA", subject="A", max.Rmismatch=1) # same as max.Rmismatch=c(-1,1)

To have loaded ShortRead was irrelevant and perhaps confusing.

Now, what should happen with the call above, from your point of view?

The first test is for the whole Rpattern "ending at the right end of the subject", thus extending off the left end of the subject by 1 letter as you say. For a match, we need to allow for at least 1 error, and that is sufficient because the rest of the Rpattern is the same as the subject. Note that this 1 goes in the *last* element of the max.Rmismatch vector. So, this test should succeed,
the entire subject should get trimmed, and your result should be "".

To gain insight, you have

        showMethods("trimLRPatterns", includeDefs=TRUE)

That might lead you to make your trimLRPatterns call under

        debug(Biostrings:::.XStringSet.trimLRPatterns)

The abort

Error in solveUserSEW(width(x), start = start, end = end, width = width) : solving row 1: 'allow.nonnarrowing' is FALSE and the supplied start (0) is < 1

clearly occurs in the call

        narrow(subject, start=0, end=-1)

at the end of .XStringSet.trimLRPatterns, but who is at fault?

Well, Biostrings:::.computeTrimEnd() returned the -1, violating this comment:

        ### 'subject' must be an XStringSet object of length != 0.
### Returns an integer vector of the same length as 'subject' where the i-th
        ### value is guaranteed to be >= 0 and <= width(subject)[i].

I think -1 is correct, and the comment is false [see my claim about the authors].

Then, just before the call to narrow(), we have this in .XStringSet.trimLRPatterns:

## For those invalid ranges where 'start > end + 1L', we arbitrarily ## decide to set the 'start' to 'end + 1' (another reasonable choice
    ## would have been to set the 'end' to 'start - 1').

With end=-1, start=1 (from .computeTrimStart, correctly) gets changed to end+1 = 0.

In this situation, the other choice would have fixed your problem. I don't think it
would create any other problems.

-Harris

On Feb 16, 2011, at 8:12 AM, Ludo Pagie wrote:

Hi all,

I'm trimming reads using an Rpattern which in some cases extends on the left side. If the pattern starts at exactly position -1 trimLRPattern throws an error. If the pattern starts further 'upstream' it returns a zero length sequence, as expected:

library(ShortRead)
# create a pattern
fragment <- paste(sample(c('A','C','G','T'), 10,replace=TRUE),collapse='')
# create some reads based on the pattern; for different reads the
# pattern extends either on the left, the right, or both sides
reads <- substring(fragment,1:4,7:10)

# trim all reads; all reads should match the pattern fully and be
# trimmed from start to end
trimLRPatterns(Rpattern=fragment, subject=DNAStringSet(reads), max.Rmismatch=0.5, ranges=TRUE, Rfixed=FALSE)

# IRanges of length 4
#     start end width
# [1]     1   0     0
# [2]     0  -1     0
# [3]    -1  -2     0
# [4]    -2  -3     0

# if I want the reads to be trimmed right away an error is thrown for
# the second read
trimLRPatterns(Rpattern=fragment, subject=DNAStringSet(reads[c (1,3,4)]), max.Rmismatch=0.5, ranges=FALSE)
#   A DNAStringSet instance of length 3
#     width seq
# [1]     0
# [2]     0
# [3]     0

trimLRPatterns(Rpattern=fragment, subject=DNAStringSet(reads[2]), max.Rmismatch=0.5, ranges=FALSE)

# Error in solveUserSEW(width(x), start = start, end = end, width =
# width) :
# solving row 1: 'allow.nonnarrowing' is FALSE and the supplied start
# (0) is < 1


> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] ShortRead_1.8.2     Rsamtools_1.2.1     lattice_0.19-13
[4] Biostrings_2.18.2   GenomicRanges_1.2.2 IRanges_1.8.7
[7] multtest_2.6.0      Biobase_2.10.0

loaded via a namespace (and not attached):
[1] grid_2.12.0     hwriter_1.3     MASS_7.3-9      splines_2.12.0
[5] survival_2.36-2 tools_2.12.0



Where is this error coming from?

Ludo

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to