CharTokenizer and hard 255 char limit

Erick Erickson Tue, 21 Feb 2017 09:01:24 -0800

What do people think about making this configurable? At the moment
it's a constant that can't be altered. I see at least one situation in
the field where very long payloads are being added (look, it's
special) with a custom tokenizer that subclasses CharTokenizer which
truncates the incoming "word".


Using KeywordTokenizer can get around this as it has a c'tor that
takes a buffer length. But KeywordTokenizer obviously doesn't let you,
well, parse tokens.

Should I raise a JIRA or are there good reasons this is hard-coded?

Erick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

CharTokenizer and hard 255 char limit

Reply via email to