On 2014-06-12 09:03, Dawid Weiss wrote: Hi Dawid,
thanks for your fast response. > What's your data and why do you need to cram everything in RAM? > Perhaps there's some other options I could recommend? I'm playing with the Google ngram index. It could be used to improve LanguageTool's suggestions, by preferring a suggestion that's more common. There's berkeleylm (https://code.google.com/p/berkeleylm/) for very fast ngram lookups, but it's also RAM-based. As the ngram index is so huge, it means one still needs large amounts of RAM (10GB when using the Web1T corpus, according to the berkeleylm paper). Having a frequency lookup that requires less RAM but is at least not slow would be nice. Next thing I'd try is a Lucene index. Regards Daniel ------------------------------------------------------------------------------ HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing & Easy Data Exploration http://p.sf.net/sfu/hpccsystems _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel