[
https://issues.apache.org/jira/browse/LUCENE-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370266#comment-16370266
]
Joanita Dsouza commented on LUCENE-8179:
----------------------------------------
Actually, we use a custom analyzer which uses a stop filter with a list of stop
words.This list contains 'system'.
WhenI run the program in the microservice, it doesn't go into the
while(ts.incrementToken()) loop. But when the text has the plural word
'systems' it goes in the loop and creates the terms just fine.
> StandardTokenizer doesn't tokenize the word "system" but it works for the
> plural "systems"
> ------------------------------------------------------------------------------------------
>
> Key: LUCENE-8179
> URL: https://issues.apache.org/jira/browse/LUCENE-8179
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/analysis
> Affects Versions: 4.10.4
> Reporter: Joanita Dsouza
> Priority: Major
> Attachments: TokenizerBug.java, TokenizerBugRevised.java
>
>
> Hi,
> We use the Standard tokenizer to tokenize text. The Standard Tokenizer
> tokenizes 'systems' correctly, but it fails to tokenize 'system' Attached a
> small program to demo this.
> Is this a known issue.Is there a way to fix it? I have tried a few different
> text examples with different stop words and only this word seems to show this
> issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]