On Fri, Mar 27, 2009 at 08:08:20AM -0400, Michael McCandless wrote: > Actually, how will Lucy do this?
Every indexer opens a PolyReader [analogous to MultiSegmentReader], even when there's no data in the index, or a single segment. (I modified PolyReader so that 1-segment and 0-segment states were officially valid for this purpose.) PolyReader is a public class, as is SegReader, and PolyReader allows access to its subreaders via a Get_Seg_Readers() accessor. Once we have SegReaders in hand, we can get at per-segment doc counts and deletion counts, which I think will be sufficient for planning a merge. As for the actual implementation of MergePolicy, I haven't prototyped that out yet. Right now in KS, the infrastructure is reasonably primitive: IndexManager has a method called SegReaders_To_Merge() which accepts a PolyReader as an argument and returns an array of SegReaders representing content that should be merged. > Even though Lucy's SegmentReader is lighter weight, it still seems > like you shouldn't be opening them in the writer (except for realtime > search)? I don't see why not. When I built this into KS, I thought I was imitating your plan for Lucene. :) > Are you going to simply make the segment metadata public? If you're referring to Segment's Fetch_Metadata method, it's public. If it weren't, then plugin components couldn't use it, which would be unfortunate. I think we ought to make it easy for plugins to store their metadata as JSON inside segmeta.json file rather than resort to writing it in their own proprietary formats. In theory, any index component can peer into another's metadata, just by invoking Seg_Fetch_Metadata(segment, other_component_name). That would be a bad idea, though, because what a component might choose to store there isn't public. I do expect that we will codify what data will be present in parts of segmeta.json as part of the official Lucy file format spec, however. If you were both dumb and determined, you could duplicate all the version checking code and adhere to the spec, making it possible to (maybe) safely interpret that data. Marvin Humphrey --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org