John Wang wrote: > > Since all three methods loads docids into an int[], the lookup time is the > same for all three methods, what's > different are the load times: > > 1) 16.5 seconds, 43 MB > 2) 590 milliseconds 32.5 MB > 3) 186 milliseconds 26MB
Good analysis! Thanks for sharing the results... > > I think the payload method is good enough so we don't need to diverge from > the lucene code base. Actually, I noticed that in my program in getCachedIDs() you can remove the check if (!reader.isDeleted(tp.doc())) { This should improve the performance further (not sure how much though), because the synchronized isDeleted() call is quite expensive and not necessary. If you want to reduce the index size, you might want to try to encode the Integers more efficiently, e. g. as VInts (depending on the values of your UIDs). > However, I feel that being able to customize the > indexing process and store our own file is still more efficient both in load > time and index size. > Yes, the current payload implementation is not optimized for this use case, it can be improved with a per-doc approach like the one I suggested. -Michael > Thanks > > -John > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]