Hi Alexandre,

To parse the ClustalW results I use a SequenceAlignmentSAXParser and a
custom implementation of DefaultHandler which I call
'SequenceAlignmentContentHandler'.

The code for the custom DefaultHandler class is:


public final class SequenceCollectionContentHandler extends DefaultHandler {

    private final Map sequenceMap;
    private final Alphabet alphabet;

    private String currentSeqName;
    private String currentSeq;

    /**
     * Creates a new <code>SequenceAlignmentContentHandler</code> instance.
     *
     * @param map
     *            The map to be filled with sequences
     * @param alphabet
     *            The alphabet to be used
     */
    public SequenceCollectionContentHandler(Map map, Alphabet alphabet) {
        this.sequenceMap = map;
        this.alphabet = alphabet;
    }

    // This method is called when an element is encountered
    public final void startElement(String namespaceURI, String localName,
            String qName, Attributes atts) {

        if (localName.equals("Sequence")) {
            startCurrentSequence(atts);
        }
    }

    /*
     * (non-Javadoc)
     *
     * @see org.xml.sax.ContentHandler#characters(char[], int, int)
     */
    public final void characters(char[] ch, int start, int length)
            throws SAXException {
        String content = new String(ch, start, length);
        this.currentSeq = content;
    }

    /*
     * (non-Javadoc)
     *
     * @see org.xml.sax.ContentHandler#endElement(java.lang.String,
     *      java.lang.String, java.lang.String)
     */
    public final void endElement(String uri, String localName, String qName)
            throws SAXException {
        if (localName.equals("Sequence")) {
            endCurrentSequence();
        }

    }

    private void startCurrentSequence(Attributes atts) {
        String attName = atts.getLocalName(0);
        if (attName.equals("sequenceName")) {
            this.currentSeqName = atts.getValue(0);
        }
    }

    private void endCurrentSequence() {
        if (this.alphabet.equals(DNATools.getDNA())) {
            try {
                Sequence seq = DNATools.createDNASequence(currentSeq,
                        currentSeqName);
                this.sequenceMap.put(currentSeqName, seq);
            } catch (IllegalSymbolException e) {
                System.err.println(this.getClass()
                        + " - IllegalSymbolException: " + e.getMessage());
            }

        } else if (this.alphabet.equals(RNATools.getRNA())) {
            try {
                Sequence seq = RNATools.createRNASequence(currentSeq,
                        currentSeqName);
                this.sequenceMap.put(currentSeqName, seq);
            } catch (IllegalSymbolException e) {
                System.err.println(this.getClass()
                        + " - IllegalSymbolException: " + e.getMessage());
            }
        } else if (this.alphabet.equals(ProteinTools.getAlphabet())) {
            try {
                Sequence seq = ProteinTools.createProteinSequence(currentSeq,
                        currentSeqName);
                this.sequenceMap.put(currentSeqName, seq);
            } catch (IllegalSymbolException e) {
                System.err.println(this.getClass()
                        + " - IllegalSymbolException: " + e.getMessage());
            }
        }
    }

}


Then, the code to use the SequenceAlignmentSAXParser and the handler could
be:

                // copy and paste from here

                File alnFile = new File("/yout/aln/file"); // put here the path to the
aln output file from the clustal
                Alphabet alphabet = ...; // put here the alphabet to be use (eg.
DNATools.getDNA());

                Map seqMap = new HashMap(); // this map will be fill by the sequences
from the alignment

                SequenceAlignmentSAXParser parser = new SequenceAlignmentSAXParser();

                ContentHandler handler = new SequenceCollectionContentHandler(
                                seqMap, alphabet);
                try {
                        BufferedReader contents = new BufferedReader(new 
InputStreamReader(
                                        alnStream));

                        parser.setContentHandler(handler);
                        parser.parse(new InputSource(contents));

                } catch (FileNotFoundException fnfe) {
                        System.out.println(fnfe.getMessage());
                        System.out.println("Couldn't open file");
                } catch (IOException ioe) {
                        ioe.printStackTrace();
                } catch (SAXException se) {
                        System.err.println(se.getMessage());
                        se.printStackTrace();
                }

                // Finally I create the alignment object using the Map
                Alignment alignment = new SimpleAlignment(seqMap);


                // end of copy


So you have an Alignment instance which contains all the sequences in the
alignment. I know there are better aproximations, but this one works for
me... If you have any doubt, don't hesitate to ask again!

Cheers,

Bruno

_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to