Re: [fw-general] Zend_Search_Lucene questions ...

Alexander Veremyev Wed, 07 Feb 2007 16:03:32 -0800

Hi Sebi,

I already have these improvements in mind for a long time. So I'll do this.
I think it will be done soon because of importance of these improvements.


Actually I already work on Boolean queries optimization.


With best regards,
   Alexander Veremyev.


Sebi wrote:

-->

Thank you for your support, Alexander. Now we have to find developers toimplement these

optimizations. When do you think they will be implemented?


    * Hi Sebi,
    * I made small research on a performance problem.
    * Several things makes search time with Java Lucene and
      Zend_Search_Lucene
    * so different.
    * 1. Luke and Java Lucene searching example doesn't calculate index
    * opening time.
    * Including this time Java Lucene is only two times faster than
    * Zend_Search_Lucene.
    * 2. Boolean queries are not optimal now.
    * Boolean query should skip non-matched documents, but it tries to
    * calculate score for them (and gets right result "0").
    * If it's fixed, then searching time will be near the same as for
      Java Lucene.
    * 3. Zend_Search_Lucene doesn't optimize query yet. It should transform
    * query to most simple form and it's designed to do this, but this
      feature
    * is not implemented yet.
    * Implementation of this feature may give the same result as boolean
    * queries optimization (most queries may be transformed to
      term/multi-term
    * queries).
    * Of course, both these optimizations have to be implemented.
    * In addition to this a lot of time is taken by I/O operations
      (30-40%).
    * As I tested before, moving these operations into C extension makes
      them
    * several times faster.
    * So moving I/O into optional C extension may make Zend_Search
      faster than
    * Java Lucene :)
    * With best regards,
    *     Alexander Veremyev.
    * Sebi wrote:
    *  > I see. You got good results. I want to have them too. I think
      there might be 1 problem: my computer performance.
    *  > For about 9000 docs, and with index optimized (using optimize()
      function) I get a search (which returned about 70 docs) with a
      time of 1.5 sec. Anyway this is slow. The interesting point is
      that Luke execute the same query only in 56 ms.
    *  >
    *  > I have the following questions:
    *  >
    *  > 1. Why do you think the Luke tool search the same query in 56
      ms? It is PHP execution so slow?
    *  >
    *  > 2. I have the 7 version of Zend installed. Should I get the
      last snapshot?
    *  >
    *  > 3. Do you have any advices for improving this search process?
    *  >
    *  >
    *  >
    *  > Hi Sebi,
    *  >
    *  > 1. I've just added necessary methods.
    *  >
    *  > $index->numDocs() may be used to retrieve number of non-deleted
      documents.
    *  > $index->maxDoc() returns one greater than the largest possible
      document
    *  > number (synonym for $index->count()).
    *  >
    *  >
    *  > 2. I think, it's already a speed of PHP strings/objects processing
    *  > itself + large result set.
    *  >
    *  > I just made some tests:
    *  > PHP v5.2, WinXP
    *  > AMD Athlon 64 3000+, Seagate ST316082 7AS 160Gb SATA HD
    *  >
    *  > a.
    *  > index size - 11.000 documents
    *  > optimized index - ~42Mb (document content is also stored)
    *  > source documents size - 33Mb
    *  >
    *  > Results:
    *  > ---------------------------
    *  > find() with 11000 docs result set - ~2.0 sec
    *  > find() with 4000 docs result set  - ~0.86 sec
    *  > find() with 1000 docs result set  - ~0.35 sec
    *  > ---------------------------
    *  >
    *  > b.
    *  > index size - 6.059 documents
    *  > optimized index - ~40Mb
    *  > source documents size - 31Mb (document content is also stored)
    *  >
    *  > Results:
    *  > ---------------------------
    *  > find() with 6059 docs result set - ~0.90 sec
    *  > find() with 2 docs result set  - ~0.17 sec
    *  > find() with 0 docs result set  - ~0.17 sec
    *  > ---------------------------
    *  >
    *  >
    *  > I think it's also possible to make some optimizations.
    *  > Please add an issue into issue tracker for this (or I can do it).
    *  >
    *  >
    *  > 3. I got one report for large index some time ago:
    *  > Source data: 8Gb
    *  > 2xAMD 64 Opteron 250
    *  > iSCSI 4x36Gb in RAID 1+0
    *  > FreeBSD 7.0
    *  > Search time is 5-10 sec
    *  >
    *  > I also have some ideas for search optimization, which will work
    *  > especially for large indices.
    *  >
    *  >
    *  > With best regards,
    *  >     Alexander Veremyev.
    *  >
    *  >
    *  > Sebi wrote:
    *  >> Any answer? Alexander?
    *  >>
    *  >> Anyway I want to add some more questions.
    *  >>
    *  >> 1. The $index->count() does not reflect the real content of
      the database. I need to optimize the index for retrieving the
      correct number of documents. Is there any other way to find the
      exact count of documents?
    *  >>
    *  >> 2. I want to reopen the search problem. The time is to big.
    *  >>
    *  >>> I have 8737 documents which are indexed right now. When I
      search after  keywords like: 'arte', 'galeria', etc, I get a time
      about 3.15 sec. When I had
    *  >>> only  4500 documents my time was about 1.6 sec. The generated
      query looks like: +(((titleSrch:galeria))
      ((descriptionSrch:galeria)) ((tagsSrch:galeria)))
    *  >>> +(countryID:1) .
    *  >>> I mention that I measure only the time of the call of find()
      function. Without the retrieval of the documents fields.
    *  >> I optimized the index using optimize function and the search
      was improved. The time was about 1.5 sec (2 times faster). But
      again is too big. I have only 8737 documents and a size of index
      about 2.7 MB. Another interesting thing is that, if I use Luke for
      searching, the time is only 56 ms. So, What is the problem? The
      PHP file system access?
    *  >>
    *  >> I want help with search time because that was my first goal:
      to have a fast search by relevance. And this is not what I get
      right now.
    *  >>
    *  >> 3. How this engine will behave with 1 million of documents?
      For searching inside.
    *  >>
    *



------------------------------------------------------------------------

Bored stiff? <http://us.rd.yahoo.com/evt=49935/*http://games.yahoo.com>Loosen up...Download and play hundreds of games for free<http://us.rd.yahoo.com/evt=49935/*http://games.yahoo.com> on Yahoo! Games.

Re: [fw-general] Zend_Search_Lucene questions ...

Reply via email to