Hi,

Sebi wrote:
-->
I have some questions related to the Zend Framework Search Lucene. I want to implement a fast and scalable directory system and I need to understand some things.

Sounds very interesting!


1. I read that |Zend_Search_Lucene::optimize()function merges all index segments into new one. Won't this single segment become to big in time?

Segment can't be "too big". Merged segment always takes less memory and can be scanned faster, then several "source" segments.

Segment size is limited by 2Gb on 32-bit platforms.
(http://framework.zend.com/manual/en/zend.search.index-creation.html#zend.search.index-creation.limitations)

Or the optimize process will create more segments with a maximum number of documents (MaxMergeDocs variable is used in this case for the maximum number)?

"MaxMergeDocs is a largest number of documents ever merged by addDocument()"
Automatic index optimization (involved by addDocument()) is an incremental process. It merges several small segments into new one, which is larger.
When it has enough "larger" segments it merges them.
And so on.

MaxMergeDocs guarantees, that addDocument() will never execute longer than we want. It's a limitation for auto-optimization.


2. Is the optimize process automatic?

Yes. But you may call Zend_Search_Lucene::optimize() to perform full index optimization. It doesn't use MaxMergeDocs and merges all segments into one.

3. When I must use commit? Only after delete operation? Or I must use it after add operations as well. Is it an automatic process?

It's not necessary to use commit() now.
But you may use it if you want to be sure, that all changes are written down at the point of commit() call. unset($index) (where $index is a Zend_Search_Lucene object) has the same effect.

What happens after MaxBufferedDocs is reached?

1. All added documents are written down into new segment.
2. Automatic optimization process may start.


4. I didn't find anywhere details about zend search performance. Any benchmarks? Can u estimate how it will behave with 2 millions of documents? Each document will have a maximum 400 characters length (this text is indexed).

There are no any official benchmarks.
I can only say, that performance was one of the first goals for Zend_Search_Lucene. I can also say, that it's comparable with Java Lucene.

The behavior is strongly depends on an index contents and query types.
Do you have any idea about terms selectivity?


With best regards,
   Alexander Veremyev.

Reply via email to