> -----Original Message----- > From: karl wettin [mailto:[EMAIL PROTECTED] > Sent: Tuesday, May 23, 2006 6:44 PM > To: java-user@lucene.apache.org > Subject: Re: Removing search results that fall within a time range > > On Tue, 2006-05-23 at 17:38 -0400, Benjamin Stein wrote: > > I have a requirement to only return one result for all > documents whose > > timestamps fall within N seconds of one another. (where > timestamp is a > > field and N is an integer). > > > > For example, Document A is timestamped "12:00:00" and > Document B has > > timestamp "12:00:30", Document B should be discarded. On the other > > hand, if Document B has timestamp "12:01:00" then I should > return both > > (assuming 30 < N < 59 seconds). > > > > Similarly, if Documents A, B, and C have timestamps "12:00:00", > > "12:00:30", and "12:01:00" respectively, only Document A should be > > returned (because B is close to A, and C is close to B). > > > > If it helps to simplify things, we can assume results are sorted by > > time. Also, I can apply logic at index time or at search time. > > > > Any suggestions? This is a pretty tough concept to search the > > archives for... >
> How big is the corpus and how many hits do you estimate a > search can result in? Can you just take the penalty from > iterating the hits? > The corpus is very big. Approximately 300,000,000 documents and growing. I would estimate potentially a huge number of hits per search. We currently do iterate through the hits and process them like you suggest, but that requires some impressive kludges to work :) Just wondering if there was a clever way to push this logic into the index/search process. My other plan was to create a class that implements Searchable interface. This class will just forward all search requests to a private IndexSearcher data member and post-process the results before returning. I will then pass an array of these customized searchers to a ParallelMultiSearcher. Given enough parallel processing, this might work in a reasonable timeframe. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]