Sorry, I retract my proposed fix. It would break the opposite case:
trimLRPatterns(subject="A", Lpattern="AC", max.Lmismatch=1)
I lean toward trusting the comments on .computeTrimStart
and .computeTrimEnd,
meaning that the code should adhere to them -- perhaps as follows.
The last line of .computeTrimStart becomes
pmin(pattern_length + 2L - ii, width(subject) + 1L)
and the last line of .computeTrimEnd becomes
pmax(subject_width - pattern_length - 1L + ii, 0L)
No doubt I've still missed something.
On Feb 16, 2011, at 3:22 PM, Harris A. Jaffee wrote:
This will be a whole lot of detail in lieu of a final fix (proposal
below).
I don't believe the authors, certainly not I, envisioned your
situation, which I think I can
abstract correctly here:
library(Biostrings)
trimLRPatterns(Rpattern="CA", subject="A", max.Rmismatch=1) # same
as max.Rmismatch=c(-1,1)
To have loaded ShortRead was irrelevant and perhaps confusing.
Now, what should happen with the call above, from your point of view?
The first test is for the whole Rpattern "ending at the right end
of the subject", thus extending
off the left end of the subject by 1 letter as you say. For a
match, we need to allow for at least
1 error, and that is sufficient because the rest of the Rpattern is
the same as the subject. Note
that this 1 goes in the *last* element of the max.Rmismatch
vector. So, this test should succeed,
the entire subject should get trimmed, and your result should be "".
To gain insight, you have
showMethods("trimLRPatterns", includeDefs=TRUE)
That might lead you to make your trimLRPatterns call under
debug(Biostrings:::.XStringSet.trimLRPatterns)
The abort
Error in solveUserSEW(width(x), start = start, end = end, width =
width) :
solving row 1: 'allow.nonnarrowing' is FALSE and the supplied
start (0) is < 1
clearly occurs in the call
narrow(subject, start=0, end=-1)
at the end of .XStringSet.trimLRPatterns, but who is at fault?
Well, Biostrings:::.computeTrimEnd() returned the -1, violating
this comment:
### 'subject' must be an XStringSet object of length != 0.
### Returns an integer vector of the same length as 'subject'
where the i-th
### value is guaranteed to be >= 0 and <= width(subject)[i].
I think -1 is correct, and the comment is false [see my claim about
the authors].
Then, just before the call to narrow(), we have this
in .XStringSet.trimLRPatterns:
## For those invalid ranges where 'start > end + 1L', we
arbitrarily
## decide to set the 'start' to 'end + 1' (another reasonable
choice
## would have been to set the 'end' to 'start - 1').
With end=-1, start=1 (from .computeTrimStart, correctly) gets
changed to end+1 = 0.
In this situation, the other choice would have fixed your problem.
I don't think it
would create any other problems.
-Harris
On Feb 16, 2011, at 8:12 AM, Ludo Pagie wrote:
Hi all,
I'm trimming reads using an Rpattern which in some cases extends
on the left side. If the pattern starts at exactly position -1
trimLRPattern throws an error. If the pattern starts further
'upstream' it returns a zero length sequence, as expected:
library(ShortRead)
# create a pattern
fragment <- paste(sample(c('A','C','G','T'),
10,replace=TRUE),collapse='')
# create some reads based on the pattern; for different reads the
# pattern extends either on the left, the right, or both sides
reads <- substring(fragment,1:4,7:10)
# trim all reads; all reads should match the pattern fully and be
# trimmed from start to end
trimLRPatterns(Rpattern=fragment, subject=DNAStringSet(reads),
max.Rmismatch=0.5, ranges=TRUE, Rfixed=FALSE)
# IRanges of length 4
# start end width
# [1] 1 0 0
# [2] 0 -1 0
# [3] -1 -2 0
# [4] -2 -3 0
# if I want the reads to be trimmed right away an error is thrown for
# the second read
trimLRPatterns(Rpattern=fragment, subject=DNAStringSet(reads[c
(1,3,4)]), max.Rmismatch=0.5, ranges=FALSE)
# A DNAStringSet instance of length 3
# width seq
# [1] 0
# [2] 0
# [3] 0
trimLRPatterns(Rpattern=fragment, subject=DNAStringSet(reads[2]),
max.Rmismatch=0.5, ranges=FALSE)
# Error in solveUserSEW(width(x), start = start, end = end, width =
# width) :
# solving row 1: 'allow.nonnarrowing' is FALSE and the supplied
start
# (0) is < 1
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i686-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ShortRead_1.8.2 Rsamtools_1.2.1 lattice_0.19-13
[4] Biostrings_2.18.2 GenomicRanges_1.2.2 IRanges_1.8.7
[7] multtest_2.6.0 Biobase_2.10.0
loaded via a namespace (and not attached):
[1] grid_2.12.0 hwriter_1.3 MASS_7.3-9 splines_2.12.0
[5] survival_2.36-2 tools_2.12.0
Where is this error coming from?
Ludo
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing