I just love it when I get so wrapped up in a particular approach that alternatives don't occur to me. So I wondered what would happen if I just got stupid simple and tried solving what I think is your problem without involving lucene.
So, I wrote a little program to fill up a HashMap with <Integer, Integer> pairs, with the key being a random number and the value an integer starting with 0. Really, mimicking a map of your <dbid, lucdocid> pairs. Then fill up a bitset by looking up a bunch of dbids and setting the corresponding lucdocid in the bitset. Map size, 10,000,000 <userid, lucdocid> pairs Looking up 1,000,000 user ids and setting them in a bitset. Total time to set all the bits, 1.016 seconds. Running inside of Eclipse on a 2700 MH AMD with 1G memory (and I used up almost all this memory, but made no attempt to optimize it at all). Yes, that's close enough to one second not to matter. I started by wondering what would happen if I used a RAMDir to map the <userid, lucdocid> pairs, thinking you could generate that RAMDir during warmup, but wanted to get a baseline for the bitset part before dealing with Lucene. But it *is* Sunday, and this is *not* my problem, so after I got this number I decided to leave the rest of it as an "exercise for the reader <G>". But we're having rain/sleet combinations here in SE Michigan, so what the heck.... I wonder, if this approach doesn't work for you, what would happen if you built a RAMDir with this mapping (which keeps your issues with updating under control). If memory use is too intensive, I also wonder what would happen if you built a FSDirectory index with these pairs as part of warm-up. Just creating the map takes considerable time in my test program, so you probably want to consider some kind of warm-up process.... Best of luck! Erick On 1/14/07, Kay Roepke < [EMAIL PROTECTED]> wrote:
On 14. Jan 2007, at 3:54 , Erick Erickson wrote: > 3> I doubt it really will make a performance difference, but you > could use > TermDocs.seek rather than get a new termdocs for each term from the > reader. > (and if this *does* make a difference, please let me know) It seems it does. I have just changed it to use seek, and the time went from 40 sec to a little over 29secs. Still to slow, but it's the right direction :) cheers, -k -- Kay Röpke http://classdump.org/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]