Sure, that should be a configurable option.

Oh, and I neglected to mention a workaround: use the pattern tokenizer, which doesn't have a limit (yet.) But it might be slower.

-- Jack Krupansky

-----Original Message----- From: Sheng
Sent: Friday, August 15, 2014 8:13 AM
To: java-user@lucene.apache.org
Subject: Re: WhiteSpaceTokenizer

Thanks, Jack. I haven't added myself to the contributor list yet, will do
that and then login  and comment on that ticket. One quick comment:
wouldn't it be more reasonable to throw exception it a token length is more
than 255, if relaxing that limit is still debatable? This way user would
know immediately something is wrong.

On Friday, August 15, 2014, Jack Krupansky <j...@basetechnology.com> wrote:

Yeah, it should be documented better, and configurable.

Some discussion of related issues here:
https://issues.apache.org/jira/browse/LUCENE-1118
https://issues.apache.org/jira/browse/SOLR-4148

I actually filed a Jira for this already. No action so far, but PLEASE
feel free to comment on it:
https://issues.apache.org/jira/browse/LUCENE-5785

-- Jack Krupansky

-----Original Message----- From: Sheng
Sent: Thursday, August 14, 2014 11:38 PM
To: java-user@lucene.apache.org
Subject: WhiteSpaceTokenizer

The length of token has to be shorter than 255, otherwise there will
be unpredictable behaviors for this tokenizer. I see 255 is set as a
private final in the src code, but there is no documentation to explicitly
address that. Can we either make that number configurable (if not an
option, I'd like to know why), or put some notes to its java doc? I had a
hard time to figure that out...

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to