Hi all,
I am using Lucene to query Medline abstracts and as a result I get around 3
million hits. Each of the hits is processed and information from a certain
field is used.
After certain number of hits, somewhere around 1 million (not always the same
number) I get OutOfMemory exception that looks like this:
Exception in thread "main" java.lang.OutOfMemoryError
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:221)
at java.util.zip.Inflater.inflate(Inflater.java:238)
at
org.apache.lucene.document.CompressionTools.decompress(CompressionTools.java:108)
at
org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:609)
at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:385)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:231)
at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:1013)
at
org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:520)
at
org.apache.lucene.index.FilterIndexReader.document(FilterIndexReader.java:149)
at org.apache.lucene.index.IndexReader.document(IndexReader.java:947)
at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:152)
at org.apache.lucene.search.MultiSearcher.doc(MultiSearcher.java:156)
at org.apache.lucene.search.Hits.doc(Hits.java:180)
at
de.fhg.scai.bio.tamara.corpusBuilding.LuceneCmdLineInterface.queryMedline(LuceneCmdLineInterface.java:178)
at
de.fhg.scai.bio.tamara.corpusBuilding.LuceneCmdLineInterface.main(LuceneCmdLineInterface.java:152)
this line which causes problems is:
String docText = hits.doc(j).getField("DOCUMENT").stringValue() ;
I am using java 1.6 and I tried solving this issue with different garbage
collectors (-XX:+UseParallelGC and -XX:+UseParallelOldGC) but it didn't help.
Does anyone have any idea how to solve this problem?
There is also an official bug report:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6293787
Help is much appreciated. :)
Best regards,
Tamara Bobic
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]