Re: Loading 5gb index to RAMDirectory

Yonik Seeley Tue, 23 May 2006 07:29:23 -0700

Hi Michael,

The java-commits mailing list is not for posting to.
Bug reports or fixes normally get put in a JIRA.


I do think this is a good limitation to fix.
Going from int to long only costs a single cycle, and that's only on
buffer refills (i.e. negligible).

There are other places in RAMInputStream & RAMOutputStream that need
fixing too.  I'll handle that.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server


On 5/23/06, Michael Chan <[EMAIL PROTECTED]> wrote:

Hi,

I have a 5gb index at hand, stored on disk. I tried creating a
RAMDirectory out of it and it crashes everytime at around the 2gb
mark. I simply create it using:

RAMDirectory ramDir = new RAMDirectory("index");

where "index" is the path. The error messages are as follows:

"bash-2.03$ Exception in thread "main" java.lang.ExceptionInInitializerError
       at TaxonomyFinder.RelatedCatsFinder.<init>(RelatedCatsFinder.java:46)
       at 
wikipedia.WikipediaAnalyser$ExtractAbstractHandler.endElement(WikipediaAnalyser.java:295)
       at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
Source)
       at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
Source)
       at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
       at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
       at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
       at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
       at wikipedia.WikipediaAnalyser.parseAbstracts(WikipediaAnalyser.java:184)
       at 
wikipedia.WikipediaAnalyser.getRelatedCategories(WikipediaAnalyser.java:127)
       at TaxonomyFinder.TaxonomyTreeMaker.main(TaxonomyTreeMaker.java:492)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -2097152
       at java.util.Vector.elementAt(Unknown Source)
       at 
org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java:82)
       at 
org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:84)
       at 
org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:52)
       at org.apache.lucene.store.RAMDirectory.<init>(RAMDirectory.java:68)
       at org.apache.lucene.store.RAMDirectory.<init>(RAMDirectory.java:95)
       at 
word_coocurrence.WordCooccurrenceFinder.<clinit>(WordCooccurrenceFinder.java:50)
       ... 13 more"

I fixed it by simply changing RAMOutputStream.pointer to long, and
Line 72 and 73 of RAMOutputStream.java to:

int bufferNumber = (int) (pointer/BUFFER_SIZE);
int bufferOffset = (int) (pointer%BUFFER_SIZE);

Now, it all works fine. Maybe this is worth fixing.

Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Loading 5gb index to RAMDirectory

Reply via email to