You don't need to index the data. Just run the analyzer and maintain your own counters. This will be disk-bound and will run at your disk reading speed.
On Sun, Aug 19, 2012 at 5:17 PM, Shaya Potter <spot...@gmail.com> wrote: > On 08/19/2012 08:07 PM, Shaya Potter wrote: >> >> On 08/15/2012 02:34 PM, Ahmet Arslan wrote: >>>> >>>> Is there an easy way to figure out >>>> the most common tokens and then remove those tokens from the >>>> documents. >>> >>> >>> Probably this : >>> >>> http://lucene.apache.org/core/3_6_1/api/all/org/apache/lucene/misc/HighFreqTerms.html >>> >> >> unsure how to use this >> >> as far as I can tell org.apache.lucene.misc.TermStats doesn't exist in >> lucene 3.6.1 (there seems to be some class like that in 4.x, but that >> doesn't help me). > > > I'm wrong, its there, but eclipse isn't seeing it (haven't tried javac by > itself), even though it sees HighFreqTerms just fine. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Lance Norskog goks...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org