>> I have some questions related to the Zend Framework Search Lucene. I >> want to implement a fast and scalable directory system and I need to >> understand some things.
>Sounds very interesting! >> 1. I read that |Zend_Search_Lucene::optimize()function merges all index >> segments into new one. Won't this single segment become to big in time? >Segment can't be "too big". Merged segment always takes less memory and >can be scanned faster, then several "source" segments. >Segment size is limited by 2Gb on 32-bit platforms. >(http://framework.zend.com/manual/en/zend.search.index-creation.html#zend.search.index-creation.limitations) Well, let's say the segment will reach the 2GB limit. What will the optimizer do in this case? Let's say we have a segment with 2GB size and 2 small segments with 50MB each. What will the optimizer do in this case? Will it begin a second segment by merging thouse 2 segments with 50MB size (now there will be 2 segments after the optimizer finishes). >> Or the optimize process will create more segments with a maximum number >> of documents (MaxMergeDocs variable is used in this case for the maximum >> number)? >"MaxMergeDocs is a largest number of documents ever merged by addDocument()" >Automatic index optimization (involved by addDocument()) is an >incremental process. It merges several small segments into new one, >which is larger. >When it has enough "larger" segments it merges them. >And so on. >MaxMergeDocs guarantees, that addDocument() will never execute longer >than we want. It's a limitation for auto-optimization. Let me see if I understand. addDocument() function will add documents to a single segment. When the number of thouse documents will reach MaxMergeDocs, then the optimizer will merge this segment together withe the other ones. The optimizer will not merge the segments which contains a number of documents less than MaxMergeDocs. Is it right? Or I did not understand the MaxMergeDocs variable. >> 2. Is the optimize process automatic? >Yes. But you may call Zend_Search_Lucene::optimize() to perform full >index optimization. It doesn't use MaxMergeDocs and merges all segments >into one. >> 3. When I must use commit? Only after delete operation? Or I must use it >> after add operations as well. Is it an automatic process? >It's not necessary to use commit() now. >But you may use it if you want to be sure, that all changes are written >down at the point of commit() call. >unset($index) (where $index is a Zend_Search_Lucene object) has the same >effect. When do the automatic commit() start? Only to the end of the script? Or maybe after MaxBufferedDocs is reached? Where are the documents stored before I call commit function? In the memory? >> What happens after MaxBufferedDocs is reached? >>1. All added documents are written down into new segment. >>2. Automatic optimization process may start. >> 4. I didn't find anywhere details about zend search performance. Any >> benchmarks? Can u estimate how it will behave with 2 millions of >> documents? Each document will have a maximum 400 characters length (this >> text is indexed). >There are no any official benchmarks. >I can only say, that performance was one of the first goals for >Zend_Search_Lucene. I can also say, that it's comparable with Java Lucene. >The behavior is strongly depends on an index contents and query types. >Do you have any idea about terms selectivity? __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
