Khindikaynen Aleksey created LUCENENET-596:
----------------------------------------------
Summary: QueryParser produces a wrong query if KeywordRepeatFilter
is used in analyzer
Key: LUCENENET-596
URL: https://issues.apache.org/jira/browse/LUCENENET-596
Project: Lucene.Net
Issue Type: Bug
Components: Lucene.Net.Analysis.Common
Affects Versions: Lucene.Net 4.8.0
Reporter: Khindikaynen Aleksey
Below is a code sample illustrating how to reproduce the issue:
{code:java}
var query = "+FieldName:Value_0";
var parser = new QueryParser(LuceneVersion.LUCENE_48, "FieldName",
new CustomAnalyzer());
var res = parser.Parse(query);
class CustomAnalyzer : Analyzer
{
protected override TokenStreamComponents CreateComponents(string
fieldName, TextReader reader)
{
var tokenizer = new LetterOrDigitTokenizer(LuceneVersion.LUCENE_48,
reader);
TokenStream stream = new StandardFilter(LuceneVersion.LUCENE_48,
tokenizer);
stream = new KeywordRepeatFilter(stream);
return new TokenStreamComponents(tokenizer, stream);
}
}
class LetterOrDigitTokenizer : CharTokenizer
{
public LetterOrDigitTokenizer(LuceneVersion matchVersion, TextReader
input) : base(matchVersion, input)
{
}
protected override bool IsTokenChar(int c)
{
return char.IsLetterOrDigit((char)c);
}
}
{code}
Result query is different in 3.0.3 and 4.8 versions:
Lucene 3.0.3
+FieldName:"(value value) 0"
Lucene 4.8 beta 4
+((FieldName:value FieldName:valu) FieldName:0)
So if we have a document with FieldName == "0" (without the word "value"), it
would be found with Lucene 4.8 anyway.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)