On Mon, Nov 01, 2004 at 03:49:39PM +0100, Can Gencer wrote: > Hello everyone, > > We are trying to parse a quite large multiple BLAST results file (around > 4GB), and the computer available has 1GB of RAM. However, when the code > in the cookbook is used ( > "http://www.biojava.org/docs/bj_in_anger/BlastParser.htm"), using the > BlastLikeSAXParser it will give out an OutOfMemory exception after a > short while, and when I monitor the system during the parsing, I don't > see the memory usage going up significantly. It is the > parse(InputSource) method that throws the exception. Is there a way to > solve this problem ?
Hi, When you use the BioJava blast parser as described in the BJIA article, it does build a fairly comprehensive set of objects which reflect the contents of the blast output. If those objects turn out to be bigger than your available memory, then you'll either have to split up the output or process it in a "streaming" fashion. The BioJava blast parsers actually work by converting the blast output to XML, which is then presented to a SAX contenthandler. The normal strategy is to use a ContentHandler which builds objects, and this is what the BioJava BlastLikeSearchBuilder class is doing. However, there's nothing to stop you writing a custom ContentHandler which extracts the information you want directly from the XML representation. This strategy should let you process unlimited amounts of blast output without running into memory problems, but does involve a certain amount of work. If you want to see what the XML representation looks like, try the demos/nativeapps/BlastLike2XML.java script, included in the BioJava source distribution. However, since you say "I don't see the memory usage going up significantly", I'm wondering if your program is *really* exhausting system memory, or if you're just hitting the default limit on the Java heap size. On many platforms, the default heap size can be pretty low. You can control it using the -Xmx and -Xms options (try typing java -X for proper descriptions). On a 1Gb machine, I'd suggest trying something like: java -Xmx850M YourProgram This allows Java to use the bulk of system memory, while still leaving a bit left for the operating system, etc. Hope this helps, Thomas. _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l