I don;t know how big your sequence database is, but it should be possible
to load your sequences into memory and avoid most of the IO overhead and
just focus on compute effectiveness.

Performing a one sequence against a DB search can be nicely parallelized
and if you have a bunch of CPUs or even a few computers, there are some
nice libraries that allow you to parallelize things massively. (or just
write something multi-threaded if you have only one computer with multiple
CPUs)

Andreas






On Fri, Jan 17, 2014 at 10:15 AM, Peter S <[email protected]> wrote:

> Thanks Andreas,
>
> I am switching from python/perl so my java is not great but with the
> implementation you mention I would need to pass the sequence each time and
> run it one by one? SSEARCH is also 'slow' (SW) but has a lot of
> optimization in place so at the end it does not take that long to run it.
> It's in C++ though.
>
> Peter
>
>
>   On Friday, 17 January 2014, 18:09, Andreas Prlic <[email protected]>
> wrote:
>  We do have a Smith Waterman implementation in Biojava. However the
> algorithm is based on dynamic programming, which by definition is "slow"
> but gives you the optimal alignment...
>
> http://biojava.org/wiki/BioJava:CookBook3:PSA#Local_alignment
>
> Andreas
>
>
>
>
> On Fri, Jan 17, 2014 at 9:50 AM, Peter S <[email protected]> wrote:
>
> Thanks, I will give it a try.
>
> Does it mean there is no fast implementation of SW in java that I can use?
>
> Best,
> Peter
>
>
>
> On Friday, 17 January 2014, 17:45, Khalil El Mazouari <
> [email protected]> wrote:
>
> Hi Peter,
>
> give it a try with Levenshtein Distance. You can use StringUtils from
> apache common lang. it has a getLevenshteinDistance method.
>
> best,
>
> Khalil
>
>
>
> On 17 Jan 2014, at 18:37, Peter S <[email protected]> wrote:
>
> Hi Khalil,
> >
> >
> >By short sequence I mean 12-18 nt long. I need to make alignment against
> the entire transcriptome and detect matches with up to 3 mismatches. This
> is the reason I need something quite fast but sensitive at the same time.
> >
> >
> >Many thanks,
> >Peter
> >
> >
> >
> >On Friday, 17 January 2014, 17:26, Khalil El Mazouari <
> [email protected]> wrote:
> >
> >Hi,
> >
> >what do you mean by short sequences? NT or AA?
> >
> >Best
> >
> >Khalil
> >
> >On 17 Jan 2014, at 18:00, [email protected] wrote:
> >
> >> Send Biojava-l mailing list submissions to
> >>     [email protected]
> >>
> >> To subscribe or unsubscribe via the World Wide Web, visit
> >>     http://lists.open-bio.org/mailman/listinfo/biojava-l
> >> or, via email, send a message with subject or body 'help' to
> >>     [email protected]
> >>
> >> You can reach the person managing the list at
> >>     [email protected]
> >>
> >> When replying, please edit your Subject line so it is more specific
> >> than "Re: Contents of Biojava-l digest..."
> >>
> >>
> >> Today's Topics:
> >>
> >>   1. Database search with Smith and Waterman (Peter S)
> >>
> >>
> >> ----------------------------------------------------------------------
> >>
> >> Message: 1
> >> Date: Fri, 17 Jan 2014 13:27:17 +0000 (GMT)
> >> From: Peter S <[email protected]>
> >> Subject: [Biojava-l] Database search with Smith and Waterman
> >>
>  To: "[email protected]" <[email protected]>
> >> Message-ID:
> >>     <[email protected]>
> >> Content-Type: text/plain; charset=iso-8859-1
> >>
> >> Dear All,?
> >>
> >> I'm looking for an implementation of Smith and Waterman algorithm to
> use in the Java desktop application I want to develop.?
> >>
> >> I did find some information on pairwise aligners but what I would
> ideally want to have is something similar to the SSEARCH package that can
> perform alignments against a very big databases,
>  saved locally in a fasta format. Speed is quite important and ideally I
> would need an output that I can easily parse, identifying mismatch/gap
> positions etc.
> >>
> >> Any suggestions if there is any java implementation that would fit the
> description? I will be working on short sequences so sensitivity is
> crucial.?
> >>
> >> Thanks very much for your help,
> >> Peter
> >>
> >>
> >> ------------------------------
> >>
> >> _______________________________________________
> >> Biojava-l mailing list  -  [email protected]
> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>
> >>
> >> End of Biojava-l Digest, Vol 131, Issue 3
> >>
>  *****************************************
> >
> >
> >
> >
> >
> >-----
> >
> >Confidentiality Notice: This e-mail and any files transmitted with it are
> private and confidential and are solely for the use of the addressee. It
> may contain material which is legally privileged. If you are not the
> addressee or the person responsible for delivering to the addressee, please
> notify that you have received this e-mail in error and that any use of it
> is strictly prohibited. It would be helpful if you could notify the author
> by replying to it.
> >
> >
> >
> >
> >
>
>
>
>
>
>
> -----
>
> Confidentiality Notice: This e-mail and any files transmitted with it are
> private and confidential and are solely for the use of the addressee. It
> may contain material which is legally privileged. If you are not the
> addressee or the person responsible for delivering to the addressee, please
> notify that you have received this e-mail in error and that any use of it
> is strictly prohibited. It would be helpful if you could notify the author
> by replying to it.
>
> _______________________________________________
> Biojava-l mailing list  -  [email protected]
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
>
>
>
>
>


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------
_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to