2009/8/3 Ryan Golhar <[email protected]>: > I'm trying to perform a large amount of sequence alignments of long DNA > sequences, some up to 163,000+ bp in length. I was trying to use the > standard Needleman-Wunsch algorithm, but the matrix used requires a large > amount of memory...about 100 GB of memory. This obviously won't work.
For two sequences in the region of > 85% similarity, MUMMER [1] works very well. For example, aligning two strains of e. coli on my desktop, both in the region of 460 kb: * U00096 (Escherichia coli str. K-12 substr. MG1655) * CP000948 (Escherichia coli str. K12 substr. DH10B) time nucmer U00096.fasta CP000948.fasta real 0m14.035s user 0m11.370s sys 0m0.400s It uses k-mer based alignment heuristics to do things very quickly and efficiently. HTH, Dan. [1] http://mummer.sourceforge.net/ > I tried using stretcher from the EMBOSS package, but it takes way too long > to align each pair of sequences. I'm looking for something that can perform > alignments fast using a reasonable amount of memory. > > I found one tool, called AVID, but have been unsuccessful in getting it to > run to the sequence set I have. > > Before I go an try to develop a new solution to this, does anyone have or > recommend a program to perform a large number of global pairwise alignments > for long sequences? > > Ideally, something with the speed similar to BLAST. > > Ryan > > _______________________________________________ > BBB mailing list > [email protected] > http://www.bioinformatics.org/mailman/listinfo/bbb > _______________________________________________ BBB mailing list [email protected] http://www.bioinformatics.org/mailman/listinfo/bbb
