Re: [fw-general] zend_search_lucene: Is this a bug? Error after adding x number of documents? (ver. 0.8.0)

Sebi Petreanu Mon, 05 Mar 2007 11:20:27 -0800

Hello guys,

I read your messages and I'm sorry you couldn't find a solution. I had the
same problems when I tried to index my documents. you can see some of these
errors here:
http://www.nabble.com/Zend_Search_Lucene-errors-tf3205213s16154.html#a8900427.


I had this problem only on WinXP system. I want to make the following
observations:  
    - When AntiVirus is running the errors appears more often.
    - When I place a wait time between consecutive functions calls that
access the index (delete(), find(), addDocument()) the probability to index
all documents is increased. I actually succeeded to index 8000 documents
using a wait time between consecutive functions calls.

Alexander, I have a little question regard the search optimizations. Why the
second query so bad search times compared with the first one?
1. +(((titleSrch:arte)) ((descriptionSrch:arte)) ((tagsSrch:arte))) is made
in 0.077 sec
2. +(((titleSrch:arte)) ((descriptionSrch:arte)) ((tagsSrch:arte)))
+(countryID:1) is made in 0.50 sec

The difference is huge. What do you think? 





Alexander Veremyev wrote:
> 
> Hi,
> 
> As I've checked, you don't need ...Common_Text class. All you need is 
> Zend_Search_Lucene_Analysis_Analyzer:
> 
> -- myCoolAnalyzer.php ----------------------------------
> <?php
> /** Zend_Search_Lucene_Analysis_Analyzer */
> require_once 'Zend/Search/Lucene/Analysis/Analyzer.php';
> 
> class myCoolAnalyzer extends Zend_Search_Lucene_Analysis_Analyzer
> {
>      /**
>       * Word list
>       *
>       * @var array
>       */
>      private $_wordList;
> 
>      /**
>       * Reset token stream
>       */
>      public function reset()
>      {
>          $this->_wordList = explode(' ', $this->_input);
>          reset($this->_wordList);
>      }
> 
>      /**
>       * Tokenization stream API
>       * Get next token
>       * Returns null at the end of stream
>       *
>       * Tokens are returned in UTF-8 (internal Zend_Search_Lucene
> encoding)
>       *
>       * @return Zend_Search_Lucene_Analysis_Token|null
>       */
>      public function nextToken()
>      {
>          if (($word = current($this->_wordList)) === false) {
>              return null;
>          }
>          next($this->_wordList);
> 
>          return new 
> Zend_Search_Lucene_Analysis_Token(iconv($this->_encoding, 'UTF-8', 
> $word), -1, -1 /* dummy start/end positions */);
>      }
> }
> --------------------------------------------------------
> 
> -- myCode.php ----------------------------------
> /** Zend_Search_Lucene_Analysis_Analyzer */
> require_once 'myCoolAnalyzer.php';
> 
> Zend_Search_Lucene_Analysis_Analyzer::setDefault(new myCoolAnalyzer());
> --------------------------------------------------------
> 
> 
> PS One of standard analyzers has to be used while searching because this 
> functionality (explode()) may not work for user entered queries.
> 
> With best regards,
>     Alexander Veremyev.
> 
> 
> 
> bongobongo wrote:
>> 
>> 
>> Alexander Veremyev wrote:
>>>
>>> Because you have words in the db records already prepared for indexing
>>> :)
>>> Default analyzer walks through the string char by char, checks if it's 
>>> letter/digit or not and so on. Than it applies lower case filter.
>>>
>>> You don't need these. Take a look on 
>>> Zend_Search_Lucene_Analysis_Analyzer_Common_Text class. You may make it 
>>> much more effective with explode() function for your case.
>>>
>> 
>> Sounds like a good idea.
>> 
>> Could you please show some example code on how to do that?
>> 
>> How to use the ...Common_Text class together with the
>> explode() function....?
>> 
>> Regards
>> 
>> 
>> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/zend_search_lucene%3A-Is-this-a-bug--Error-after-adding-x-number-of-documents--%28ver.-0.8.0%29-tf3309996s16154.html#a9318427
Sent from the Zend Framework mailing list archive at Nabble.com.

Re: [fw-general] zend_search_lucene: Is this a bug? Error after adding x number of documents? (ver. 0.8.0)

Reply via email to