Ola Spjuth wrote:

Hi,

I am investigating the usefulness of BioJava as a backend for sequence
management in Bioclipse (www.bioclipse.net). As a total newbie to
Biojava, I have read the tutorial, BIA examples, glanced at the API,
read my first FASTA-sequence and have come up with a few questions:

1) Is it possible to search the Biojava-l archives without having to
manually browse by month?

2) Is there a wrapper for SequenceIO.fileToBiojava(..) that
automatically detects file formats or is it necessary to distinguish
sequence formats externally, i.e. with different file-extensions? If so,
does anyone know of a complete list of file-extensions that could be
mapped to a format?
There is a deprecated piece of code available, that quite many people actualy use in their code still. Even though it might not be the greatest thing to try to auto
guess file format, its the desireable thing to do in many cases.
If i just look at people in my lab, they want to open the file, they dont want to keep
track of what file format that particular sequence was in, and so on.

So, even if file format guessing is bad, people are going to write it, and imho its better to have one centralised good, known to work file guesser, then several
different implementations that differ in each persons own application.

So, my suggestion is to start with using the deprecated version thats in biojava, if it gets removed you can easily just copy that small part of the code into your own
application, or as an external little jarfile.

3) How robust are the I/O-classes for different formats? The
test-library provided is rather short in my opinion and my first test
broke since there was a space in the wrong position...

4) What are the capabilities for multiple sequence alignment in Biojava?
Is it limited to parse results into Biojava objects (as in BIA) or does
it contain any stable MSA-implementations? Due to BioJavas size it is
not easy to get an overview of the current capabilities and the standard
of different parts.

There is some support for multiple alignments in biojava. The Alignment interface and implementations happily handle multiple alignments. And you can choose how to interpret it, either as SymbolList over a crossproduct alphabet, or as individual
sequences accessable by some label.

There is a basic framework for handling multiple alignment formats in the biojava org.biojava.bio.seq.io package. It currently only implements two formats, FASTA and MSF. Most programs seem to be able to generate multiple alignment output into either FASTA or MSF format so you should be able to get the results into
biojava.

5) As a novice, has anyone implemented BLAST or CLUSTALW in Java? Any
public web-services running for this?

I have been told by greater deities that implementing BLAST in java is hard, because the blast algorithm makes heavy use of low level data structures, pointers ? and similar things that are very hard to implement and controll in java. So the resulting implementation
would most likely run pretty darn slow, and not do what you want.

Depending on what you want to do with BLAST, the biojava SSAHA implementation might be something you can use instead ( it works pretty ok on quite conserved sequences,
but its not realy suited for more divergent sequences )

When it comes to webservices i just know of a few things, i have not used any of these to an large extent, so i cant comment on how well they work for large sequences, big
jobs and so on.

http://www.ebi.ac.uk/Tools/webservices/services.html
http://xml.ddbj.nig.ac.jp/wsdl/index.jsp

Sadly they all use their own data encoding and service invocation setup, so its pretty darn
annoying to use.


6) Is there some example-code on how to use DAS (as a client)?

7) How can I submit an RFE?

Sorry for so many questions in one post; I have a lot of catching up to
do and was hoping for some guidance. Some answers have probably already
been answered in earlier posts but I have not been able to search the
archives.

Cheers,

  .../Ola




_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to