Hi, there,

I have a protein dataset in FASTA format.  The sequence has an ID, followed by a 
description as shown below:

>AAP00006; Sequence encoded by leader sequence of core antigen.
gglfhlcliiscscptvqasklclgwl

If I use the snippet attached at the end of this email, I will get the result with 
only the ID, but no description like this:

AAP00006;
GGLFHLCLIISCSCPTVQASKLCLGWL

If I delete a space between ";" and "Sequence" like this one:

>AAP00006;Sequence encoded by leader sequence of core antigen.
gglfhlcliiscscptvqasklclgwl

I will get this:

AAP00006;Sequence
GGLFHLCLIISCSCPTVQASKLCLGWL

So, obviously the method SeqIOTools.readFastaProtein() uses a space (probably all 
kinds of whitespace) as delimiters to parse whatever into the name property in a 
sequence.  My question is how I can specify my own delimiter and then display the 
whole line here as a sequence's name.

Please help.  Thanks a lot.

Zhen

Code snippet:

import java.io.*;
import org.biojava.bio.*;
import org.biojava.bio.seq.*;
import org.biojava.bio.seq.io.*;

public class TestSeqIOTools {

    public static void main(String[] args) {

        if (args.length != 1) {
            System.out.println("Usage: java TestSeqIOTools filename.fasta");
            System.exit(1);
        }

        try {
            BufferedReader fin = new BufferedReader(new FileReader(args[0]));
            SequenceIterator stream = SeqIOTools.readFastaProtein(fin);
            while(stream.hasNext()) {
                Sequence seq = stream.nextSequence();
                System.out.println(seq.getName());
                System.out.println(seq.seqString());
            }
            fin.close();
        } catch(BioException e) {
            System.err.println("BioException: " + e.getMessage());
            e.printStackTrace();
            System.exit(0);
        } catch(IOException ex) {
            System.err.println("IOException: " + ex.getMessage());
        }
    }
}

_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to