Hi Ludo,

Yes matchPDict() used to support fixed=FALSE. It still does, but only
when the PDict object is made using the old implementation of the
Aho-Corasick algo ('algo="ACtree"'):

  > pdict <- PDict(c("ACCT", "GACC", "CCCT", "CCCA"), algo="ACtree")
  > matchPDict(pdict, DNAString("GNCCT"), fixed="pattern")[[3]]
  IRanges of length 1
      start end width
  [1]     2   5     4

The "ACtree" algo has been superseded by the "ACtree2" algo, a faster
and more memory efficient implementation of the same algo that uses a
different internal representation than "ACtree" for the Aho-Corasick
tree.

The 'fixed=TRUE' (or 'fixed="pattern"') option is not yet supported
for PDict objects built with the new algo. I'll add this ASAP. Thanks
for the reminder!

Cheers,
H.

On 06/25/2010 03:46 AM, Ludo Pagie wrote:

hi all,

I'm trying to match 80bp reads to a construct, a sequence of +/-
550bp. The construct contains a strecth of N's, representing a
stretch of 20 random nucleotides.

I constructed a pdict from the reads, and a DNAString from the
construct. When I run matchPDict with fixed=TRUE, all goes fine
and I get 1.2M matches.

construct_mindex<- matchPDict(pdict, DNAString(construct), max.mismatch=3)
sum(countIndex(construct_mindex))
[1] 1280283


With fixed=FALSE I get the following error:

construct_mindex<- matchPDict(pdict, DNAString(construct), max.mismatch=3, 
fixed=FALSE)
Error in .match.PDict3Parts.XString(pd...@threeparts, subject, max.mismatch,  :
   walk_tb_nonfixed_subject(): implement me

Is there a way around this non-implemented function? Or any
chance it will be implemented soon? Or am I missing something.

If you need more background let me know.

Ludo

sessionInfo()
R version 2.12.0 Under development (unstable) (2010-06-17
r52313)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets
methods   base

other attached packages:
[1] ShortRead_1.7.7      Rsamtools_1.1.7
lattice_0.18-8
[4] GenomicRanges_1.1.12 Biostrings_2.17.7    IRanges_1.7.7

loaded via a namespace (and not attached):
[1] Biobase_2.9.0 grid_2.12.0   hwriter_1.2   tools_2.12.0

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to