Re: easy way to figure out most common tokens?

Lance Norskog Sun, 19 Aug 2012 19:04:55 -0700

You don't need to index the data. Just run the analyzer and maintain
your own counters. This will be disk-bound and will run at your disk
reading speed.


On Sun, Aug 19, 2012 at 5:17 PM, Shaya Potter <spot...@gmail.com> wrote:
> On 08/19/2012 08:07 PM, Shaya Potter wrote:
>>
>> On 08/15/2012 02:34 PM, Ahmet Arslan wrote:
>>>>
>>>> Is there an easy way to figure out
>>>> the most common tokens and then remove those tokens from the
>>>> documents.
>>>
>>>
>>> Probably this :
>>>
>>> http://lucene.apache.org/core/3_6_1/api/all/org/apache/lucene/misc/HighFreqTerms.html
>>>
>>
>> unsure how to use this
>>
>> as far as I can tell org.apache.lucene.misc.TermStats doesn't exist in
>> lucene 3.6.1 (there seems to be some class like that in 4.x, but that
>> doesn't help me).
>
>
> I'm wrong, its there, but eclipse isn't seeing it (haven't tried javac by
> itself), even though it sees HighFreqTerms just fine.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>



-- 
Lance Norskog
goks...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: easy way to figure out most common tokens?

Reply via email to