Re: experiences with lingpipe

Breck Baldwin Fri, 03 Nov 2006 07:00:13 -0800


Martin Braun wrote:

Hi Breck,

i have tried your tutorial and built (hopefully) a successful
SpellCheck.model File with
49M.
My Lucene Index directory is 2,4G. When I try to read the Model with the
readmodel function,
i get an "Exception in thread "main" java.lang.OutOfMemoryError: Java
heap space", though I started java with -Xms1024m -Xmx1024m.

How many RAM will I need for the Model (I only have 2 GB of physical
RAM, and lucene's also using some memory).

You need to increase the memory for java. I think 32-bit jave is limitedto a 1.3 gig heap but could be wrong. No heuristics at the tip of myfingers.


To make the spell checker smaller you can prune the tokens using the

pruneLM method in the TrainSpellChecker. Pruning the 1 counts shouldmake a big difference and not hurt spelling too much (depends on howthings are paramterized). Probably up to 5 counts won't matter.


Also look at my tuning tutorial that is in very rough shape but will
get you going on tuning at:

cvs -d:pserver:[EMAIL PROTECTED]:/usr/local/sandbox coquerySpellCheckTuner


I will try to get another pass at it over the weekend.

b reck


Is there a "rule of thumb" to calculate the needed amount of memory of
the model?

thanks in advance,

martin

Tuning params dominate the performance space. A small beam (16 active
hypotheses) will be quite snappy (I have 200 queries/sec with a 32 beam.
over a 80 gig text collection that with some pruning was 5 gig in memory
running an 8 gram model)




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: experiences with lingpipe

Reply via email to