Joanita Dsouza created LUCENE-8179: -------------------------------------- Summary: StandardTokenizer doesn't tokenize the word "system" but it works for the plural "systems" Key: LUCENE-8179 URL: https://issues.apache.org/jira/browse/LUCENE-8179 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.10.4 Reporter: Joanita Dsouza Attachments: TokenizerBug.java
Hi, We use the Standard tokenizer to find stop words from text using a predefined list of stop words.This list contains 'system' as one of the words. While tokenizing a text. The Standard Tokenizer tokenizes 'systems' correctly, but it fails to tokenize 'system' Attached a small program to demo this. Is this a known issue.Is there a way to fix it? I have tried a few different text examples with different stop words and only this word seems to show this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org