Hi,
I have a file containing GenBank records, and I want to process them thus:
RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader,
null);
while (seqs.hasNext()) {
RichSequence seq = seqs.nextRichSequence();
// processing code
}
however, some records cannot be parsed by biojava... this is to be expected
as I'm processing half a million records - some are bound to be wonky. So I
use a try-catch to skip over troublesome records:
RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader,
null);
while (seqs.hasNext()) {
try{
RichSequence seq = seqs.nextRichSequence();
// processing code
} catch (BioException e){
System.out.println("record count not be parsed!");
}
}
However, it seems that the position in the input file is not changed if an
exception is thrown during parsing. If I run the above code on a file
containing a single un-parseable record, it gets stuck in a non-terminating
loop - i.e. each time seqs.nextRichSequence() is called, an exception is
thrown, but seqs.hasNext() still returns true. Is there a correct way to
deal with this? I could split up my input file into multiple records and do
something like:
ArrayList<String> records = splitGenBankFileIntoRecords();
for (String singleRecord : records){
BufferedReader singleRecordReader = new BufferedReader(new
StringReader(singleRecord));
RichSequenceIterator seqs =
RichSequence.IOTools.readGenbankDNA(singleRecordReader, null);
try{
RichSequence seq = seqs.nextRichSequence();
// processing code
} catch (BioException e){
System.out.println("record count not be parsed!");
}
}
but this seems inefficient, as I have to instantiate a new StringReader,
BufferedReader and RichSequenceIterator for every record (half a milion
cycles of object creation/destruction!)
Any ideas?
--
------------------------
Martin Jones
School of Biological Sciences,
Ashworth Laboratories, King's Buildings
Edinburgh, EH9 3JT, UK
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l