Hi Ludo,
Yes matchPDict() used to support fixed=FALSE. It still does, but only
when the PDict object is made using the old implementation of the
Aho-Corasick algo ('algo="ACtree"'):
> pdict <- PDict(c("ACCT", "GACC", "CCCT", "CCCA"), algo="ACtree")
> matchPDict(pdict, DNAString("GNCCT"), fixed="pattern")[[3]]
IRanges of length 1
start end width
[1] 2 5 4
The "ACtree" algo has been superseded by the "ACtree2" algo, a faster
and more memory efficient implementation of the same algo that uses a
different internal representation than "ACtree" for the Aho-Corasick
tree.
The 'fixed=TRUE' (or 'fixed="pattern"') option is not yet supported
for PDict objects built with the new algo. I'll add this ASAP. Thanks
for the reminder!
Cheers,
H.
On 06/25/2010 03:46 AM, Ludo Pagie wrote:
hi all,
I'm trying to match 80bp reads to a construct, a sequence of +/-
550bp. The construct contains a strecth of N's, representing a
stretch of 20 random nucleotides.
I constructed a pdict from the reads, and a DNAString from the
construct. When I run matchPDict with fixed=TRUE, all goes fine
and I get 1.2M matches.
construct_mindex<- matchPDict(pdict, DNAString(construct), max.mismatch=3)
sum(countIndex(construct_mindex))
[1] 1280283
With fixed=FALSE I get the following error:
construct_mindex<- matchPDict(pdict, DNAString(construct), max.mismatch=3,
fixed=FALSE)
Error in .match.PDict3Parts.XString(pd...@threeparts, subject, max.mismatch, :
walk_tb_nonfixed_subject(): implement me
Is there a way around this non-implemented function? Or any
chance it will be implemented soon? Or am I missing something.
If you need more background let me know.
Ludo
sessionInfo()
R version 2.12.0 Under development (unstable) (2010-06-17
r52313)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets
methods base
other attached packages:
[1] ShortRead_1.7.7 Rsamtools_1.1.7
lattice_0.18-8
[4] GenomicRanges_1.1.12 Biostrings_2.17.7 IRanges_1.7.7
loaded via a namespace (and not attached):
[1] Biobase_2.9.0 grid_2.12.0 hwriter_1.2 tools_2.12.0
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing