Re: are long words split into up to 256 long tokens?

Ahmet Arslan Wed, 21 Apr 2010 06:50:48 -0700

> Is 256 some inner maximum too
> in some
> lucene internal that causes this? What is happening is that
> the long
> word is split into smaller words up to 256 and then the min
> and max
> limit applied. Is that correct? I have removed LengthFilter
> and still
> see the splitting at 256 happen. I would like not to have
> this, and
> removed altogheter any word longer than max, wihtout
> decomposing into
> smaller ones. Is there a way to achieve this?
> 
> Using lucene 3.0.1



Assuming your Tokenizer extends CharTokenizer:

CharTokenizer.java has this field: 
private static final int MAX_WORD_LEN = 255;

you can modify CharTokenizer.java according to your needs.


      

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: are long words split into up to 256 long tokens?

Reply via email to