Hi,
i've a little question concerning queries. There are two files, one for the
index creation and one for the search.
I used the Utf8_CaseInsensitive-Analyzer in both files and tried to search
something with the Query Parser and with the Query-API.
I think i've build the same query with both ways, but got different results:
// file1: create index:
setlocale(LC_ALL,'de_AT.UTF-8');
...
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive());
...
$index = Zend_Search_Lucene::create('testindex/myindex');
...
// add doc
$doc = new Zend_Search_Lucene_Document();
...
$doc->addField(Zend_Search_Lucene_Field::Text('afield', 'test Vienna Wien
Βιέννη Беч Вена'));
...
$index->addDocument($doc);
...
$index->commit();
// file2: search
setlocale(LC_ALL,'de_AT.UTF-8');
...
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive());
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
...
$index = Zend_Search_Lucene::open('testindex/myindex');
// search with query parser:
$query1 = Zend_Search_Lucene_Search_QueryParser::parse('afield:Vie*'); //
<---- hit
$query1 = Zend_Search_Lucene_Search_QueryParser::parse('afield:vie*'); //
<---- hit
$query2 = Zend_Search_Lucene_Search_QueryParser::parse('afield:Tes*'); //
<---- hit
... $hits = $index->find($queryX); ...
// search with query-api:
$term = new Zend_Search_Lucene_Index_Term('Vie*', 'afield'); // <---- no
hit
$query1 = new Zend_Search_Lucene_Search_Query_Wildcard($term);
$term = new Zend_Search_Lucene_Index_Term('vie*', 'afield'); // <---- hit
$query1 = new Zend_Search_Lucene_Search_Query_Wildcard($term);
$term = new Zend_Search_Lucene_Index_Term('Tes*', 'afield'); // <---- no
hit
$query1 = new Zend_Search_Lucene_Search_Query_Wildcard($term);
... $hits = $index->find($queryX); ...
I think that the QueryParser uses strtolower and the API not. Of course if i
use
$term = new Zend_Search_Lucene_Index_Term(mb_strtolower('Vie*'),
'afield');
every query gets a hit.
For me it would be logical if i need not to use strtolower and it will be
done automatically (like with QueryParser),
so for example i could change the analyzer to case-sensitive and it wouldn't
make a difference.
And my last question:
Did i misunderstand something or did i use the Query-API wrong? ;)
thx
Stefan
--
View this message in context:
http://www.nabble.com/Lucene-Utf8-Utf8_CaseInsensitive-Question-tp19532087p19532087.html
Sent from the Zend Framework mailing list archive at Nabble.com.