On Thu, 24 Jun 2004 12:34:35 +0200 Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Vladimir Yuryev wrote:
Hi Andrzej!
I am sorry for my English :-(
I with pleasure shall tell about the test and I shall try to state conditions of the test in detail.
I don't quite understand what you are saying... Do you suspect there is a bug in Luke somewhere on the Search tab? If >that's the case, please provide an example.
1. Search was made on an index with coding Cp1251.
2. Conditions of search:
Analyzer to use for query parsing: org.apache.lucene.analysis.ru. RussianAnalyzer
Default field is:contents
2.1. Enter search expression here:высказался (the coding windows-1251)
Result: No Results 2.2. Enter search expression here:высказал* (the coding windows-1251)
Result: 1 doc (s), url: http://www.agnuz.info/result.php?year=2004&mounth1=March&day=26&files=v02.txt&print=news
Time to refresh my russian... :-) Ok, the problem seems to be in the RussianAnalyzer - it uses RussianLetterTokenizer, which filters out anything which is a non-letter - I'm afraid it filters out also the wildcard at the end. Not only that, it then passes the tokens through a RussianStemmer, which further mutilates the tokens.
Please try the "Parsed query view" on the "Search" tab to see what is the result of your query, or paste your query into the text area on the AnalyzerTool plugin ("Plugins"), and see what tokens you get using RussianAnalyzer.
I just did it, and the result for "высказал*" was "высказа" - clearly not what you wanted.
-- Best regards, Andrzej Bialecki
------------------------------------------------- Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator ------------------------------------------------- FreeBSD developer (http://www.freebsd.org)
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Hi Andrzej!
Well.
To the address: "http://www.agnuz.info/result.php?year=2004&mounth1=March&day=26&files=v02.txt&print=news" there is a full text in which I searched for a phrase "...Pontiff has expressed importance...", in russian "Понтифик высказался о важности".
Please try the "Parsed query view" on the "Search" tab to see what is the result of your query
In a bookmark "Search" the phrase has not been found. The problem was (for some reason?!) in the second and third words? Search by separate words (simple terms) has found out a problem in these last two words. And so, for "Analyzer to use for query parsing: ": org.apache.lucene.analysis.ru.RussianAnalyzer,
"Entry search expression here": [texts in coding Cp1251] -
1. "Entry search expression here ":"Понтифик высказался о важности".
"Parsed query view": contents:"понтифик высказа важност".
- No Results2. "Entry search expression here":Понтифик
"Parsed query view": contents:понтифик - 2 doc (s)
URLs:
"http: // www.agnuz.info/result.php? year=2004&mounth1=March&day=26&files=v01.txt&print=news" "http: // www.agnuz.info/result.php? year=2004&mounth1=March&day=26&files=v02.txt&print=news"
3. "Entry search expression here":высказался
"Parsed query view": contents:высказа - No Results
4. "Entry search expression here":важности
"Parsed query view": contents:важност
- No Results5. "Entry search expression here":Понтифик высказался о важности.
"Parsed query view": contents:понтифик contents:высказа contents:важност.
- 2 doc (s)-> the same documents as point 2.
.., or paste your query into the text area on the AnalyzerTool plugin ("Plugins"), and see what tokens you get using RussianAnalyzer.
In a tab "Plugins" in a field "Text to be analyzed" I have tested the same three words as a phrase - "Понтифик высказался о важности". As a result of the analysis in a field "Tokens found" three have been shown stemms - "понтифик", "высказа" and "важност". Actions - " hilite-> " has given positive results by all three words. (Similar a problem not in filters?):-)
Best regards, Vladimir.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
