Re: ANN: Luke v. 0.5 released

Andrzej Bialecki Thu, 24 Jun 2004 03:35:18 -0700

Vladimir Yuryev wrote:

Hi Andrzej!
I am sorry for my English :-( I with pleasure shall tell about the test and I shall try to state conditions of the test in detail.

I don't quite understand what you are saying... Do you suspect there is a bug in Luke somewhere on the Search tab? If >that's the case, please provide an example.
1. Search was made on an index with coding Cp1251. 2. Conditions of search: Analyzer to use for query parsing: org.apache.lucene.analysis.ru. RussianAnalyzer Default field is:contents

2.1. Enter search expression here:высказался (the coding windows-1251) Result: No Results 2.2. Enter search expression here:высказал* (the coding windows-1251) Result: 1 doc (s), url: http://www.agnuz.info/result.php?year=2004&mounth1=March&day=26&files=v02.txt&print=news

Time to refresh my russian... :-) Ok, the problem seems to be in the RussianAnalyzer - it uses RussianLetterTokenizer, which filters out anything which is a non-letter - I'm afraid it filters out also the wildcard at the end. Not only that, it then passes the tokens through a RussianStemmer, which further mutilates the tokens.

Please try the "Parsed query view" on the "Search" tab to see what is the result of your query, or paste your query into the text area on the AnalyzerTool plugin ("Plugins"), and see what tokens you get using RussianAnalyzer.

I just did it, and the result for "высказал*" was "высказа" - clearly not what you wanted.

--
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: ANN: Luke v. 0.5 released

Reply via email to