RE: [fw-general] Zend_Search_Lucene - how to find performance problem source?

Alexander Veremyev Wed, 22 Oct 2008 11:06:08 -0700

Hi Endijs!

Hm... Large number of fields shouldn't slow down search performance a
lot.


Lucene format doesn't store fields and documents as a table. 
It stores a list of terms (<field, value> pairs) ordered
lexicographically.
Each term refers corresponding list of documents containing term and
list of term positions within documents.

So it doesn't matter if individual tags are stored using different
fields or not. Overall number of tag terms will be the same.

On the other hand Zend_Search_Lucene uses all indexed fields for
searching (Java Lucene searches in the contents field by default). Query
is transformed internally to do this. That may produce extremely large
query (query may be repeated 2000 times in your case).
It's reduced on query optimizing step, but it takes a time

You can avoid such query transformation by setting search field
explicitly (it still can be overridden in subqueries).
E.g. you want find through 'author', 'title' and 'contents' fields an
addition to using some tags or other supplementary fields:

$userQuery = "author:($userQuer) title:($userQuery)
contents:($userQuery)";
....



Your solution is also good and can be used as a common way for such
cases.

One fix. If used analyzer skips ';' and spaces (that's true for all
Lucene analusers), then it should be manually skipped during term
creation.
Zend_Search_Lucene_Index_Term doesn't do any transformation using
analyzers, so corrected code:

$term = new Zend_Search_Lucene_Index_Term($tag_id, 'tags');



With best regards,
   Alexander Veremyev.


> -----Original Message-----
> From: Endijs Lisovskis [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 22, 2008 1:17 AM
> To: [email protected]
> Subject: Re: [fw-general] Zend_Search_Lucene - how to find performance
> problem source?
> 
> 
> I think i have found the problem. Thanks everyone for your time. I
really
> appreciate it. ZF community rockz! :)
> 
> I opened index with Luke (simply amazing program!) and i quickly
realized
> that I have very stupid flaw in index generation. Please don't make
such a
> stupid mistakes. I will explain what i did wrong - maybe someone will
save
> a
> lot of debugging time.
> In example before I mentioned how index was build - which variables
were
> added as Keywords, which as Text, which UnStored etc. But I completely
> overlooked one part of my code. Each article have tags. And I wanted
to
> add
> tags to documents too. So I made stupid adding to code:
> foreach($tag as $key => $value)
> $this->addField(Zend_Search_Lucene_Field::Keyword('tag_'.$key,
> $value,'utf-8'));
> In result of that index consisted of about 2000 fields.
> I took away that tag keyword thing and now there are only 10 fields in
> index
> and search is peformed very fast. I even don't know what I wanted to
> achieve
> with the way I added tags to index, because that doesn't make any
sense.
> 
> I have one question - regarding how to add tags to index in best way.
Lets
> say, that each article can have multiple tags. Multiple articles can
share
> the same tags. Which is the smartest way to add tags - so that they
could
> be
> searchable. I figured out this way:
> $value = '';
> foreach($tag as $k => $v) $value .= 'tag_'.$k.'; ';
> $this->addField(Zend_Search_Lucene_Field::Keyword('tags',
$value,'utf-8'));
> 
> And in that case search can be made (to find results that contain
search
> phrase in article text and particular tag added to article) :
> $index = My_Search_Lucene::open(self::$config->lucene->dir);
>  $query = new Zend_Search_Lucene_Search_Query_Boolean();
> $term = new Zend_Search_Lucene_Index_Term($tag_id.'; ', 'tags');
> $tempQuery = new Zend_Search_Lucene_Search_Query_Term($term);
> $query->addSubquery($tempQuery, true);
> $userQuery =
Zend_Search_Lucene_Search_QueryParser::parse($searchPhrase,
> 'utf-8');
> $query->addSubquery($userQuery, true);
> $results = $index->find($query);
> 
> But I doubt that is the smartest way. Maybe you can suggest something?
> 
> Endijs
> --
> View this message in context:
http://www.nabble.com/Zend_Search_Lucene---
> how-to-find-performance-problem-source--tp20085562p20099565.html
> Sent from the Zend Framework mailing list archive at Nabble.com.

RE: [fw-general] Zend_Search_Lucene - how to find performance problem source?

Reply via email to