Dave,
Before you look for an external solution, I recommend you try the vcountPattern function in Biostrings. In a training course back in Nov 08 we showed the efficiency of vcountPattern in finding adapter-like reads.

http://bioconductor.org/workshops/2008/SeattleNov08/MatchAlign/MatchAlign.pdf

see pages 7 - 10

As a first approximation, I would guess your code would look something like:

adapter <- DNAString("ACGGATTGTTCAGT")
prefix <- substring(myReads, 1, nchar(adapter))
suffix <- substring(myReads, nchar(myReads) - nchar(adapter) + 1, 
nchar(myReads))
whichAdapters <- which(vcountPattern(adapter, prefix, max.mismatch = 1) +
                        vcountPattern(adapter, suffix, max.mismatch = 1) > 0)
nonAdapterReads <- myReads[- whichAdapters]
adapterReads <- myReads[whichAdapters]
adapterReads


Patrick


Dan Bolser wrote:
2009/1/8 David A.G <[email protected]>:
Dear list,

I have some experience with Bioconductor but am newbie to this list and to NGS. 
I am trying to remove some adapters from my solexa s_N_sequence.txt file using 
Biostrings and ShortRead packages and the vignettes.  I managed to read in the 
text file and got to save the reads as follows

fqpattern <- "s_4_sequence.txt"
f4 <- file.path(analysisPath(sp), fqpattern)
fq4 <- readFastq(sp, fqpattern)
reads <- sread(fq4)  #"reads" contains more than 4 million 34-length fragments

Having the following adapter sequence:

adapter <- DNAString("ACGGATTGTTCAGT")

I tried to mimic the example in the Biostring vignette as follows:


myAdapterAligns <- pairwiseAlignment(reads, adapter, type = "overlap")

but after more than two hours the process is still running.

I am running R 2.8.0 on a 64bit linux machine (Kubuntu 2.6.24) with 4Gb RAM, 
and I only have some 30Mb free RAM left. I found a thread on adapter removal 
but does not clear things much to me, since as far as I understood the option 
mentioned in the thread is not appropriate (quote :(though apparently this is 
not entirely satisfactory, see the second entry!)).

Is this just a memory issue or am I doing something wrong? Shall I leave the 
process to run for longer?

TIA for your help,

Dave

Hi Dave

I think a stand alone C program may be more appropriate for the task
you are trying to perform. I'm new to NGS myself, but I believe there
are many software available to do this. I think the convenience of
using R natrualy results in a performance hit on some intensive
algorithms.

Try asking your question over here:

http://seqanswers.com/


or is there a better mailing list?

Cheers,

Dan.

_________________________________________________________________
Show them the way! Add maps and directions to your party invites.

       [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to