Parallelising a query...

Winton Davies Wed, 28 Nov 2001 21:09:53 -0800

Hi,
 
   Let say I want to retrieve all relevant listings for a query (just 
suppose)...


   I have 4 million documents... I could:
 
   Split these into 4 x 1 million document indexes  and then send a 
query to 4 Lucene processes ? At the end I would have to sort the 
results by relevance.

   Question for Doug or any other Search Engine guru -- would this 
reduce the time to find these results by 75% ?
 
   I know it is probably a hard question to answer (i.e. all the 
documents that match, might just be in one process...) but I'm more 
getting at the average length of the inverted indexes that have to be 
joined being reduced by 75%, hence the join should take only 25% of 
the time...

  Any thoughts on this idiocy ? Reason why I ask ? Well, lets say I 
can't fit a 4 million document RamDir index into 1GB heap space, but 
I could if I split it up :) ?

   Cheers,
    Winton
 

 
 
 

Winton Davies
Lead Engineer, Overture (NSDQ: OVER)
1820 Gateway Drive, Suite 360
San Mateo, CA 94404
work: (650) 403-2259
cell: (650) 867-1598
http://www.overture.com/

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Parallelising a query...

Reply via email to