Hi, there, I have a protein dataset in FASTA format. The sequence has an ID, followed by a description as shown below:
>AAP00006; Sequence encoded by leader sequence of core antigen. gglfhlcliiscscptvqasklclgwl If I use the snippet attached at the end of this email, I will get the result with only the ID, but no description like this: AAP00006; GGLFHLCLIISCSCPTVQASKLCLGWL If I delete a space between ";" and "Sequence" like this one: >AAP00006;Sequence encoded by leader sequence of core antigen. gglfhlcliiscscptvqasklclgwl I will get this: AAP00006;Sequence GGLFHLCLIISCSCPTVQASKLCLGWL So, obviously the method SeqIOTools.readFastaProtein() uses a space (probably all kinds of whitespace) as delimiters to parse whatever into the name property in a sequence. My question is how I can specify my own delimiter and then display the whole line here as a sequence's name. Please help. Thanks a lot. Zhen Code snippet: import java.io.*; import org.biojava.bio.*; import org.biojava.bio.seq.*; import org.biojava.bio.seq.io.*; public class TestSeqIOTools { public static void main(String[] args) { if (args.length != 1) { System.out.println("Usage: java TestSeqIOTools filename.fasta"); System.exit(1); } try { BufferedReader fin = new BufferedReader(new FileReader(args[0])); SequenceIterator stream = SeqIOTools.readFastaProtein(fin); while(stream.hasNext()) { Sequence seq = stream.nextSequence(); System.out.println(seq.getName()); System.out.println(seq.seqString()); } fin.close(); } catch(BioException e) { System.err.println("BioException: " + e.getMessage()); e.printStackTrace(); System.exit(0); } catch(IOException ex) { System.err.println("IOException: " + ex.getMessage()); } } } _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l