Re: [fw-general] zend_search_lucene: Is this a bug? Error after adding x number of documents? (ver. 0.8.0)

Sebi Mon, 05 Mar 2007 12:46:16 -0800

Hi Alex,

I tested another index created in the past (which includes about 13.000 
documents) and the I got the following times (this indexer is bigger):

1. +(((titleSrch:arte)) ((descriptionSrch:arte)) ((tagsSrch:arte))) is made in 
0.079 sec
2. +(((titleSrch:arte)) ((descriptionSrch:arte)) ((tagsSrch:arte))) 
+(countryID:1) is made in 0.89 sec

The index is optimized. The first query has 229. The last query has 190 results.

----- Original Message ----
From: Alexander Veremyev <[EMAIL PROTECTED]>
To: Sebi Petreanu <[EMAIL PROTECTED]>
Cc: [email protected]
Sent: Monday, March 5, 2007 10:21:29 PM
Subject: Re: [fw-general] zend_search_lucene: Is this a bug? Error after adding 
x number of documents? (ver. 0.8.0)

Hi Sebi,

Sebi Petreanu wrote:
> Hello guys,
> 
> I read your messages and I'm sorry you couldn't find a solution. I had the
> same problems when I tried to index my documents. you can see some of these
> errors here:
> http://www.nabble.com/Zend_Search_Lucene-errors-tf3205213s16154.html#a8900427.
> 
> I had this problem only on WinXP system. I want to make the following
> observations:  
>     - When AntiVirus is running the errors appears more often.
>     - When I place a wait time between consecutive functions calls that
> access the index (delete(), find(), addDocument()) the probability to index
> all documents is increased. I actually succeeded to index 8000 documents
> using a wait time between consecutive functions calls.

Thanks for your comments!

One additional question. Is indexing performed in "exclusive" mode (or 
do you have any parallel search/index updating tasks during indexing 
your documents)?

Parallel searching/indexing/updating must work correctly, but it 
important to find where the bug is.

> Alexander, I have a little question regard the search optimizations. Why the
> second query so bad search times compared with the first one?
> 1. +(((titleSrch:arte)) ((descriptionSrch:arte)) ((tagsSrch:arte))) is made
> in 0.077 sec
> 2. +(((titleSrch:arte)) ((descriptionSrch:arte)) ((tagsSrch:arte)))
> +(countryID:1) is made in 0.50 sec
> 
> The difference is huge. What do you think? 

First query is optimized to one multi-term query. Second can't be 
reduced to a primitive query:

1. => "titleSrch:arte descriptionSrch:arte tagsSrch:arte"
or
"titleSrch:arte OR descriptionSrch:arte OR tagsSrch:arte"

2. => "+(titleSrch:arte descriptionSrch:arte tagsSrch:arte) +countryID:1"
or
"(titleSrch:arte OR descriptionSrch:arte OR tagsSrch:arte) AND countryID:1"

Other thing which may affect performance is searched term selectivity.
Please check execution time of "countryID:1" query.

PS I have an idea how to improve performance of boolean queries, which 
has clauses with large result sets.
But I am not sure I'll have time for this before 0.9
That would be good if you will have possibility to send me your index, 
which gives these results, or make an Jira issue to have it in mind.

With best regards,
    Alexander Veremyev.

____________________________________________________________________________________
Never miss an email again!
Yahoo! Toolbar alerts you the instant new Mail arrives.
http://tools.search.yahoo.com/toolbar/features/mail/

Re: [fw-general] zend_search_lucene: Is this a bug? Error after adding x number of documents? (ver. 0.8.0)

Reply via email to