It should parse until the first space as the unique id. Lots of extra info gets 
added in to the header. You should find a getOriginalHeader method that will 
preserve to contents of the header. I use this when writing the sequences back 
to disk to restore the original header.

You can also do your own custom header parser which we use to support the known 
different fasta headers. If you have extra information in the header you can 
formally associate that with the sequence at the time of the parse. We can also 
add support for your header if it is standard ouput from a device.

Thanks

Scooter

----- Reply message -----
From: "Hannes Brandstätter-Müller" <[email protected]>
To: "biojava-l" <[email protected]>
Subject: [Biojava-l] FASTA Header Parser
Date: Wed, Jan 11, 2012 9:30 am



Hi there -

I just came across a puzzling "feature" of the GenericFastaHeaderParser.
It seems to throw away everything in the header after (and including) "length="
(see GenericFastaHeaderParser.java lines 71-76)

... Why?

Also, is there a Fasta Header Parser I can use that does not mess
about with the header?

I really would like to have that as key (still working on my
FASTA/QUAL parsing) and not having that (only in the originalHeader,
not in the Hashmap key) really breaks stuff.

Hannes
_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to