Re: Per-document Payloads

Michael Busch Sun, 21 Oct 2007 12:39:11 -0700

John Wang wrote:

> 
> Since all three methods loads docids into an int[], the lookup time is the
> same for all three methods, what's
> different are the load times:
> 
> 1) 16.5 seconds,      43 MB
> 2) 590 milliseconds     32.5 MB
> 3) 186 milliseconds  26MB


Good analysis! Thanks for sharing the results...

> 
> I think the payload method is good enough so we don't need to diverge from
> the lucene code base. 

Actually, I noticed that in my program in getCachedIDs() you can remove
the check
  if (!reader.isDeleted(tp.doc())) {

This should improve the performance further (not sure how much though),
because the synchronized isDeleted() call is quite expensive and not
necessary.

If you want to reduce the index size, you might want to try to encode
the Integers more efficiently, e. g. as VInts (depending on the values
of your UIDs).

> However, I feel that being able to customize the
> indexing process and store our own file is still more efficient both in load
> time and index size.
> 

Yes, the current payload implementation is not optimized for this use
case, it can be improved with a per-doc approach like the one I suggested.

-Michael


> Thanks
> 
> -John
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Per-document Payloads

Reply via email to