Martin Braun wrote:
Hi Breck,

i have tried your tutorial and built (hopefully) a successful
SpellCheck.model File with
49M.
My Lucene Index directory is 2,4G. When I try to read the Model with the
readmodel function,
i get an "Exception in thread "main" java.lang.OutOfMemoryError: Java
heap space", though I started java with -Xms1024m -Xmx1024m.

How many RAM will I need for the Model (I only have 2 GB of physical
RAM, and lucene's also using some memory).

You need to increase the memory for java. I think 32-bit jave is limited to a 1.3 gig heap but could be wrong. No heuristics at the tip of my fingers.

To make the spell checker smaller you can prune the tokens using the
pruneLM method in the TrainSpellChecker. Pruning the 1 counts should make a big difference and not hurt spelling too much (depends on how things are paramterized). Probably up to 5 counts won't matter.

Also look at my tuning tutorial that is in very rough shape but will
get you going on tuning at:

cvs -d:pserver:[EMAIL PROTECTED]:/usr/local/sandbox co querySpellCheckTuner

I will try to get another pass at it over the weekend.

b reck



Is there a "rule of thumb" to calculate the needed amount of memory of
the model?

thanks in advance,

martin



Tuning params dominate the performance space. A small beam (16 active
hypotheses) will be quite snappy (I have 200 queries/sec with a 32 beam.
over a 80 gig text collection that with some pruning was 5 gig in memory
running an 8 gram model)




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to