[Bioc-sig-seq] matchPWM() without PWMscoreStartingAt()

Ivan Gregoretti Thu, 24 Feb 2011 13:49:48 -0800

Hi everybody,

The function matchPWM() that is invoked like this


matchPWM(pwm, subject, min.score="80%", ...)

provides a very fast way of identifying the positions in 'subject' that
match a Position Weight Matrix with a minimum score of 80 percent (relative
to the maximum score).

The result looks like this

matchPWM(pwmMa,Mmusculus[['chr19']],min.score="90%")
 Views on a 61342430-letter DNAString subject
subject: NNNNNNNNNNNNNNNNNNNNNNNNNNNN...TATCTCCGAGATGCACCCGCCTGCTTGC
views:
         start      end width
 [1]  3499390  3499407    18 [TTTTGGTCATGGTGCTTC]
  [2]  3847991  3848008    18 [TTGAATTCCTGATCCTTC]
 [3]  4433615  4433632    18 [TTTGATTCCTGATTCTTC]
  [4]  5200625  5200642    18 [TCTTAGTCATGGTGCTTT]
 [5]  6475713  6475730    18 [TCTTGTTTCTGCTGCTTC]
  [6]  8899734  8899751    18 [TTTCATTCCTGTTTCTTT]
 [7]  9435046  9435063    18 [TTGTGTTCCTGCTGCTTC]
  [8]  9490297  9490314    18 [TTTTATTCATGCTGTTTC]
 [9] 10148754 10148771    18 [TCATGGTCATGGTGCTTC]
  ...      ...      ...   ... ...
[103] 53870063 53870080    18 [TCAAAGTCATGCTCCTTC]
[104] 55886021 55886038    18 [TCTTTTTCCTGGTACTTC]
[105] 56158096 56158113    18 [GCTGATTCTTGCTGCTTC]
[106] 56585172 56585189    18 [TTTCATTCCAGTTGCTTA]
[107] 56905278 56905295    18 [TTCTGTTCCTGGTACTTC]
[108] 57831404 57831421    18 [TTTTATTTATGTTACTTC]
[109] 58794481 58794498    18 [TCGAATTCCTGCTGCTTT]
[110] 59452136 59452153    18 [TTGCATTTGTGTTACTTC]
[111] 59597303 59597320    18 [TTTTAGTCATGGTGTTTC]


My question:

How do you get matchPWM to report the score?


Why that is important:

The function PWMscoreStartingAt() was created to answer that question. Now,
for some PWMs, running matchPWM with a permissive min.score like "80%"
results in half a million matches. The problem is that PWMscoreStartingAt()
can compute the scores one match at a time. In other words, it takes a
DNAString but not a DNAStringSet.

Rather than hoping for a vercotised version of PWMscoreStartingAt(), I guess
that it is more practical to try to retrieve the score computed by
matchPWM().

Any suggestions?

Thank you,

Ivan

> sessionInfo()
R version 2.13.0 Under development (unstable) (2010-12-02 r53747)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C              LC_TIME=en_US.utf8

 [4] LC_COLLATE=en_US.utf8     LC_MONETARY=C
LC_MESSAGES=en_US.utf8
[7] LC_PAPER=en_US.utf8       LC_NAME=C                 LC_ADDRESS=C

[10] LC_TELEPHONE=C            LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C


attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods
base

other attached packages:
[1] multicore_0.1-4                    lattice_0.19-17
[3] BSgenome.Mmusculus.UCSC.mm9_1.3.17 BSgenome_1.19.3
[5] cosmo_1.17.0                       seqLogo_1.17.0
[7] GenomicRanges_1.3.18               Biostrings_2.19.9
[9] IRanges_1.9.22

loaded via a namespace (and not attached):
[1] Biobase_2.11.8 tools_2.13.0


Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1016 and 1-301-496-1592
Fax: 1-301-496-9878

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

[Bioc-sig-seq] matchPWM() without PWMscoreStartingAt()

Reply via email to