Hi,

i've a little question concerning queries. There are two files, one for the
index creation and one for the search.
I used the Utf8_CaseInsensitive-Analyzer in both files and tried to search
something with the Query Parser and with the Query-API.
I think i've build the same query with both ways, but got different results:




// file1: create index:
setlocale(LC_ALL,'de_AT.UTF-8');
...
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
        new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive());
...
$index = Zend_Search_Lucene::create('testindex/myindex');
...
// add doc
$doc = new Zend_Search_Lucene_Document();
...
$doc->addField(Zend_Search_Lucene_Field::Text('afield', 'test Vienna Wien
Βιέννη Беч Вена'));
...
$index->addDocument($doc);
...
$index->commit();


// file2: search
setlocale(LC_ALL,'de_AT.UTF-8');
...
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
        new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive());
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
...
$index = Zend_Search_Lucene::open('testindex/myindex');

// search with query parser:
$query1 = Zend_Search_Lucene_Search_QueryParser::parse('afield:Vie*'); //
<---- hit
$query1 = Zend_Search_Lucene_Search_QueryParser::parse('afield:vie*'); //
<---- hit
$query2 = Zend_Search_Lucene_Search_QueryParser::parse('afield:Tes*'); //
<---- hit

... $hits = $index->find($queryX); ...

// search with query-api:
$term  = new Zend_Search_Lucene_Index_Term('Vie*', 'afield'); // <---- no
hit
$query1 = new Zend_Search_Lucene_Search_Query_Wildcard($term);
$term  = new Zend_Search_Lucene_Index_Term('vie*', 'afield'); // <---- hit
$query1 = new Zend_Search_Lucene_Search_Query_Wildcard($term);
$term  = new Zend_Search_Lucene_Index_Term('Tes*', 'afield'); // <---- no
hit
$query1 = new Zend_Search_Lucene_Search_Query_Wildcard($term);

... $hits = $index->find($queryX); ...




I think that the QueryParser uses strtolower and the API not. Of course if i
use
        $term  = new Zend_Search_Lucene_Index_Term(mb_strtolower('Vie*'),
'afield');
every query gets a hit.

For me it would be logical if i need not to use strtolower and it will be
done automatically (like with QueryParser),
so for example i could change the analyzer to case-sensitive and it wouldn't
make a difference.

And my last question:
Did i misunderstand something or did i use the Query-API wrong? ;)

thx
Stefan
-- 
View this message in context: 
http://www.nabble.com/Lucene-Utf8-Utf8_CaseInsensitive-Question-tp19532087p19532087.html
Sent from the Zend Framework mailing list archive at Nabble.com.

Reply via email to