>>>>> "Dan" == Dan Bolser <[EMAIL PROTECTED]> writes:

    Dan> Hello, I am a new user to biojava (and almost new to java).

    Dan> The following code works fine reading a 'FASTA' format file,
    Dan> but causes an error reading 'MSF' format...

[...]

   Dan> --- Exception in thread "main"
    Dan> java.lang.IllegalArgumentException: No alphabet was set in
    Dan> the identifier at
    Dan> org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:801)
    Dan> at
    Dan> org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:787)
    Dan> at
    Dan> ReadAlignMakeDistribution.main(ReadAlignMakeDistribution.java:60)

The exception message here is referring to the integer identifier
which biojava has for every known combination of file-format (fasta,
genbank, embl) and alphabet-type (dna, rna, protein). The way these
are created/interpreted is documented in SeqIOConstants (for the
sequence formats) and AlignIOConstants (for the alignment
formats). All the common ones exist as static int fields so that you
can compare using == or use them in switches.

The format guessing code (in SeqIOTools.identifyFormat) appears to be
missing "msf" and "clustal". This is a bug - I'll fix it today. The
result is that it guesses SeqIOConstants.UNKNOWN as the format
identifier (which has no alphabet set - hence the message).

The public method fileToBiojava(int fileType, BufferedReader br)
should work if you pass it the value AlignIOConstants.MSF_AA

Keith

-- 

- Keith James <[EMAIL PROTECTED]> Microarray Facility, Team 65 -
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK -
_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to