Itamar Syn-Hershko created LUCENE-6103:
------------------------------------------
Summary: StandardTokenizer doesn't tokenizer word:word
Key: LUCENE-6103
URL: https://issues.apache.org/jira/browse/LUCENE-6103
Project: Lucene - Core
Issue Type: Bug
Components: modules/analysis
Affects Versions: 4.9
Reporter: Itamar Syn-Hershko
StandardTokenizer (and by result most default analyzers) will not tokenize
word:word and will preserve it as one token. This can be easily seen using
Elasticsearch's analyze API:
localhost:9200/_analyze?tokenizer=standard&text=word%20word:word
If this is the intended behavior, then why? I can't really see the logic behind
it.
If not, I'll be happy to join in the effort of fixing this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]