Re: [Bioc-sig-seq] Applying grep to a large number of tags. (looking for speed)

Patrick Aboyoun Fri, 23 Jul 2010 07:12:15 -0700

Actually Biostrings::vcountPattern would be better. Then you can justfind the non-zero elements.


Patrick


On 7/23/10 7:06 AM, Patrick Aboyoun wrote:

Ivan,
Have you tried Biostrings::vmatchPattern?


Patrick


On 7/23/10 7:02 AM, Ivan Gregoretti wrote:

Hello listers,

How do you count how many of your Illumina tags match a particularsequence?




Well, actually, I know at least two ways of doing this but they are
both slow. This is perhaps the fastest one:

library(ShortRead)

# my sequence of interest (in my real case it is several kbs)

mySeq<-'AGTGAAGAAGCAAAGAGGCCTCTGCAAGTCACTCAAGAAGTCTACGATTTACATAGCTGACATA'


# load my tags keeping only the ones that pass the filter
A<- readAligned("/my/path", "s_5_export.txt.gz", "SolexaExport",
filter=compose(alignDataFilter(expression(filtering=="Y"))))

# keep only the tags that do not contain Ns
A<- clean(A)

# count how many tags match exactly some portion of mySeq
sum_A<- sum(unlist(sapply(sread(A), function(x)grep(x, mySeq))))


This has been running since yesterday and a similar strategy using
srdistance has been running for 4 days. Can anybody suggest a more
time-efficient calculation?

Thank you,

Ivan



Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1016 and 1-301-496-1592
Fax: 1-301-496-9878

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Applying grep to a large number of tags. (looking for speed)

Reply via email to