Hi Endijs! Hm... Large number of fields shouldn't slow down search performance a lot.
Lucene format doesn't store fields and documents as a table. It stores a list of terms (<field, value> pairs) ordered lexicographically. Each term refers corresponding list of documents containing term and list of term positions within documents. So it doesn't matter if individual tags are stored using different fields or not. Overall number of tag terms will be the same. On the other hand Zend_Search_Lucene uses all indexed fields for searching (Java Lucene searches in the contents field by default). Query is transformed internally to do this. That may produce extremely large query (query may be repeated 2000 times in your case). It's reduced on query optimizing step, but it takes a time You can avoid such query transformation by setting search field explicitly (it still can be overridden in subqueries). E.g. you want find through 'author', 'title' and 'contents' fields an addition to using some tags or other supplementary fields: $userQuery = "author:($userQuer) title:($userQuery) contents:($userQuery)"; .... Your solution is also good and can be used as a common way for such cases. One fix. If used analyzer skips ';' and spaces (that's true for all Lucene analusers), then it should be manually skipped during term creation. Zend_Search_Lucene_Index_Term doesn't do any transformation using analyzers, so corrected code: $term = new Zend_Search_Lucene_Index_Term($tag_id, 'tags'); With best regards, Alexander Veremyev. > -----Original Message----- > From: Endijs Lisovskis [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 22, 2008 1:17 AM > To: [email protected] > Subject: Re: [fw-general] Zend_Search_Lucene - how to find performance > problem source? > > > I think i have found the problem. Thanks everyone for your time. I really > appreciate it. ZF community rockz! :) > > I opened index with Luke (simply amazing program!) and i quickly realized > that I have very stupid flaw in index generation. Please don't make such a > stupid mistakes. I will explain what i did wrong - maybe someone will save > a > lot of debugging time. > In example before I mentioned how index was build - which variables were > added as Keywords, which as Text, which UnStored etc. But I completely > overlooked one part of my code. Each article have tags. And I wanted to > add > tags to documents too. So I made stupid adding to code: > foreach($tag as $key => $value) > $this->addField(Zend_Search_Lucene_Field::Keyword('tag_'.$key, > $value,'utf-8')); > In result of that index consisted of about 2000 fields. > I took away that tag keyword thing and now there are only 10 fields in > index > and search is peformed very fast. I even don't know what I wanted to > achieve > with the way I added tags to index, because that doesn't make any sense. > > I have one question - regarding how to add tags to index in best way. Lets > say, that each article can have multiple tags. Multiple articles can share > the same tags. Which is the smartest way to add tags - so that they could > be > searchable. I figured out this way: > $value = ''; > foreach($tag as $k => $v) $value .= 'tag_'.$k.'; '; > $this->addField(Zend_Search_Lucene_Field::Keyword('tags', $value,'utf-8')); > > And in that case search can be made (to find results that contain search > phrase in article text and particular tag added to article) : > $index = My_Search_Lucene::open(self::$config->lucene->dir); > $query = new Zend_Search_Lucene_Search_Query_Boolean(); > $term = new Zend_Search_Lucene_Index_Term($tag_id.'; ', 'tags'); > $tempQuery = new Zend_Search_Lucene_Search_Query_Term($term); > $query->addSubquery($tempQuery, true); > $userQuery = Zend_Search_Lucene_Search_QueryParser::parse($searchPhrase, > 'utf-8'); > $query->addSubquery($userQuery, true); > $results = $index->find($query); > > But I doubt that is the smartest way. Maybe you can suggest something? > > Endijs > -- > View this message in context: http://www.nabble.com/Zend_Search_Lucene--- > how-to-find-performance-problem-source--tp20085562p20099565.html > Sent from the Zend Framework mailing list archive at Nabble.com.
