Re: [Ferret-talk] indexing large tokens

David Balmain Fri, 16 Jun 2006 17:07:20 -0700

On 6/17/06, Justin Kan <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm using the StandardAnalyzer to build an index, and passing in Documents
> that have Fields that contain large tokens (22+ characters) interpersed with
> normal English words. This seems to cause the IndexWriter to slow to a
> crawl. Is this a known issue, or am I doing something wrong?


Hi Justin,

I haven't come accross this problem? Are you on Windows by any chance?
Currently Ferret is just generally slow on Windows because it is pure
Ruby code. One problem large tokens may cause is the general increase
in the number of terms in the index which can slow down indexing a
little but it would surprise me if it was making a huge difference
unless there was a particularly large number of them.

> If this is a known issue I don't have any problem just not indexing tokens
> longer than a certain length, but what's the best way to eliminate them?
> Using a TokenFilter on my own Analyzer? Sorry for the newbish questions, I'm
> new to ferret having never used lucene. Thanks in advance,

Yes, using a token filter will do the job. Have a look in the analysis
module of Ferret for some examples. I'd be interested to hear if it
makes any difference.

Cheers,
Dave
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] indexing large tokens

Reply via email to