Hi Roy, In principle, it should be possible to plug together a chain of parsers that just grab the fields you want and ignore the rest. In practice, parsers for formats like embl and genbank have been written in a more monolithic manner. If you just want the field info and don't necisarily want a blessed biojava sequence object out the end, you should be able to knock something up in a few minutes using org.biojava.bio.program.tagvalue, but I don't have a small demo script with me right now to show you how.
In the future, it would be nice to see if it is possible to implement the sequence IO interfaces using tagvalue parsers (perhaps a small adapter object). I should realy write a tagvalue tutorial. Matthew --- Roy Park <[EMAIL PROTECTED]> wrote: > Hello everyone. > > I deal with a number of pseudo EMBL/GenBank > formatted sequences, and it > would be extremely nice (?) to have methods that > only attempt to parse out > specified fields. > > The primary reason for this is that, right now, the > format.readSequence() > throws BioException way too frequently for my > purpose - i.e. although I only > need the fields X, Y and Z from each sequence > definition, the readSequence() > throws exception where it finds the field W to be > mal-formed, etc. > > I see that modified versions of the StreamReader > class, the SequenceFormat > implementing classes, etc. has to be written, which > I can do. I'm wondering > if anyone could suggest a preferred way of passing > the desired fields to be > read. > > readXXXXFields(BufferedReader _br, ArrayList(of > String) _fieldsToBeParsed).. > or > readXXXXFields(BufferedReader _br, String[] > _fieldsToBeParsed)..etc. > > (I think the readXXXXX(BufferedReader) should be > called if the second > argument is null.) > > Any input would be greatly appreciated. (what about > the naming of the > methods - readXXXXPartial()??) > > Roy K. Park > Bioinformatics Data Analyst > Lexicon Genetics Incorporated > > > > *************************************************************************** > > The contents of this communication are intended > only for the addressee and > may contain confidential and/or privileged material. > If you are not the > intended recipient, please do not read, copy, use or > disclose this > communication and notify the sender. Opinions, > conclusions and other > information in this communication that do not relate > to the official > business of my company shall be understood as > neither given nor endorsed by > it. > *************************************************************************** > > > > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l __________________________________________________ Do You Yahoo!? Everything you'll ever need on one web page from News and Sport to Email and Music Charts http://uk.my.yahoo.com _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
