Hello Lu Qiang - We get this question a lot. I have posted below a recent response (by Thomas Down) to the same question:
Hi, When you use the BioJava blast parser as described in the BJIA article, it does build a fairly comprehensive set of objects which reflect the contents of the blast output. If those objects turn out to be bigger than your available memory, then you'll either have to split up the output or process it in a "streaming" fashion. The BioJava blast parsers actually work by converting the blast output to XML, which is then presented to a SAX contenthandler. The normal strategy is to use a ContentHandler which builds objects, and this is what the BioJava BlastLikeSearchBuilder class is doing. However, there's nothing to stop you writing a custom ContentHandler which extracts the information you want directly from the XML representation. This strategy should let you process unlimited amounts of blast output without running into memory problems, but does involve a certain amount of work. If you want to see what the XML representation looks like, try the demos/nativeapps/BlastLike2XML.java script, included in the BioJava source distribution. However, since you say "I don't see the memory usage going up significantly", I'm wondering if your program is *really* exhausting system memory, or if you're just hitting the default limit on the Java heap size. On many platforms, the default heap size can be pretty low. You can control it using the -Xmx and -Xms options (try typing java -X for proper descriptions). On a 1Gb machine, I'd suggest trying something like: java -Xmx850M YourProgram This allows Java to use the bulk of system memory, while still leaving a bit left for the operating system, etc. Hope this helps, Thomas. Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 "Lu Qiang" <[EMAIL PROTECTED]> Sent by: [EMAIL PROTECTED] 11/05/2004 02:42 AM To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Parsing blast result with a lot of hit Hi, Guys, If we are tyring to parse a blast result with a lot of hits, the machine will be crashed, for example 5000 sequences blast themselves. This must be caused by a ArrayList storing all results. How to solve this problem? regards, Lu _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l