Hi Sebi,
I already have these improvements in mind for a long time. So I'll do this.
I think it will be done soon because of importance of these improvements.
Actually I already work on Boolean queries optimization.
With best regards,
Alexander Veremyev.
Sebi wrote:
-->
Thank you for your support, Alexander. Now we have to find developers to
implement these
optimizations. When do you think they will be implemented?
* Hi Sebi,
* I made small research on a performance problem.
* Several things makes search time with Java Lucene and
Zend_Search_Lucene
* so different.
* 1. Luke and Java Lucene searching example doesn't calculate index
* opening time.
* Including this time Java Lucene is only two times faster than
* Zend_Search_Lucene.
* 2. Boolean queries are not optimal now.
* Boolean query should skip non-matched documents, but it tries to
* calculate score for them (and gets right result "0").
* If it's fixed, then searching time will be near the same as for
Java Lucene.
* 3. Zend_Search_Lucene doesn't optimize query yet. It should transform
* query to most simple form and it's designed to do this, but this
feature
* is not implemented yet.
* Implementation of this feature may give the same result as boolean
* queries optimization (most queries may be transformed to
term/multi-term
* queries).
* Of course, both these optimizations have to be implemented.
* In addition to this a lot of time is taken by I/O operations
(30-40%).
* As I tested before, moving these operations into C extension makes
them
* several times faster.
* So moving I/O into optional C extension may make Zend_Search
faster than
* Java Lucene :)
* With best regards,
* Alexander Veremyev.
* Sebi wrote:
* > I see. You got good results. I want to have them too. I think
there might be 1 problem: my computer performance.
* > For about 9000 docs, and with index optimized (using optimize()
function) I get a search (which returned about 70 docs) with a
time of 1.5 sec. Anyway this is slow. The interesting point is
that Luke execute the same query only in 56 ms.
* >
* > I have the following questions:
* >
* > 1. Why do you think the Luke tool search the same query in 56
ms? It is PHP execution so slow?
* >
* > 2. I have the 7 version of Zend installed. Should I get the
last snapshot?
* >
* > 3. Do you have any advices for improving this search process?
* >
* >
* >
* > Hi Sebi,
* >
* > 1. I've just added necessary methods.
* >
* > $index->numDocs() may be used to retrieve number of non-deleted
documents.
* > $index->maxDoc() returns one greater than the largest possible
document
* > number (synonym for $index->count()).
* >
* >
* > 2. I think, it's already a speed of PHP strings/objects processing
* > itself + large result set.
* >
* > I just made some tests:
* > PHP v5.2, WinXP
* > AMD Athlon 64 3000+, Seagate ST316082 7AS 160Gb SATA HD
* >
* > a.
* > index size - 11.000 documents
* > optimized index - ~42Mb (document content is also stored)
* > source documents size - 33Mb
* >
* > Results:
* > ---------------------------
* > find() with 11000 docs result set - ~2.0 sec
* > find() with 4000 docs result set - ~0.86 sec
* > find() with 1000 docs result set - ~0.35 sec
* > ---------------------------
* >
* > b.
* > index size - 6.059 documents
* > optimized index - ~40Mb
* > source documents size - 31Mb (document content is also stored)
* >
* > Results:
* > ---------------------------
* > find() with 6059 docs result set - ~0.90 sec
* > find() with 2 docs result set - ~0.17 sec
* > find() with 0 docs result set - ~0.17 sec
* > ---------------------------
* >
* >
* > I think it's also possible to make some optimizations.
* > Please add an issue into issue tracker for this (or I can do it).
* >
* >
* > 3. I got one report for large index some time ago:
* > Source data: 8Gb
* > 2xAMD 64 Opteron 250
* > iSCSI 4x36Gb in RAID 1+0
* > FreeBSD 7.0
* > Search time is 5-10 sec
* >
* > I also have some ideas for search optimization, which will work
* > especially for large indices.
* >
* >
* > With best regards,
* > Alexander Veremyev.
* >
* >
* > Sebi wrote:
* >> Any answer? Alexander?
* >>
* >> Anyway I want to add some more questions.
* >>
* >> 1. The $index->count() does not reflect the real content of
the database. I need to optimize the index for retrieving the
correct number of documents. Is there any other way to find the
exact count of documents?
* >>
* >> 2. I want to reopen the search problem. The time is to big.
* >>
* >>> I have 8737 documents which are indexed right now. When I
search after keywords like: 'arte', 'galeria', etc, I get a time
about 3.15 sec. When I had
* >>> only 4500 documents my time was about 1.6 sec. The generated
query looks like: +(((titleSrch:galeria))
((descriptionSrch:galeria)) ((tagsSrch:galeria)))
* >>> +(countryID:1) .
* >>> I mention that I measure only the time of the call of find()
function. Without the retrieval of the documents fields.
* >> I optimized the index using optimize function and the search
was improved. The time was about 1.5 sec (2 times faster). But
again is too big. I have only 8737 documents and a size of index
about 2.7 MB. Another interesting thing is that, if I use Luke for
searching, the time is only 56 ms. So, What is the problem? The
PHP file system access?
* >>
* >> I want help with search time because that was my first goal:
to have a fast search by relevance. And this is not what I get
right now.
* >>
* >> 3. How this engine will behave with 1 million of documents?
For searching inside.
* >>
*
------------------------------------------------------------------------
Bored stiff? <http://us.rd.yahoo.com/evt=49935/*http://games.yahoo.com>
Loosen up...
Download and play hundreds of games for free
<http://us.rd.yahoo.com/evt=49935/*http://games.yahoo.com> on Yahoo! Games.