If you are willing/able to close the IndexWriter, it's easy to drop segments by reading the SegmentInfos, editing, and writing back.
Mike McCandless http://blog.mikemccandless.com On Tue, Jun 19, 2012 at 3:44 PM, Simon Willnauer <[email protected]> wrote: > On Tue, Jun 19, 2012 at 6:42 PM, mark harwood <[email protected]> wrote: >> There are a number of scenarios where Lucene might be used to index a fixed >> time range on a continuous stream of data e.g. a news feed. >> >> In these scenarios I imagine the following facilities would be useful: >> >> a) A MergePolicy that organized content into segments on the basis of >> increasing time units e.g. 5min->10 min->1 hour->1 day >> b) The ability to drop entire segments e.g. the day-level segment from >> exactly a week ago > > you can do that by subclassing IW and call some package private APIs / > members. We can certainly make that easier but I personally don't want > to open this as a public API. I can certainly imagine to have a > protected API that allows dropping entire segment. > > simon > >> c) Various new analysis functions comparing term frequencies across time e.g >> discovery of "trending" topics. >> >> I can see that a) could be implemented using a custom MergePolicy and c) can >> be done via existing APIs but I'm not sure if there is way to simply drop >> entire segments currently? >> >> Anyone else had thoughts in this area? >> >> Cheers >> Mark >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
