On Fri, Mar 27, 2009 at 9:48 AM, Marvin Humphrey <mar...@rectangular.com> wrote:
> Every indexer opens a PolyReader [analogous to MultiSegmentReader], even when > there's no data in the index, or a single segment. (I modified PolyReader so > that 1-segment and 0-segment states were officially valid for this purpose.) OK > PolyReader is a public class, as is SegReader, and PolyReader allows access to > its subreaders via a Get_Seg_Readers() accessor. Once we have SegReaders in > hand, we can get at per-segment doc counts and deletion counts, which I think > will be sufficient for planning a merge. OK. Whereas in Lucene neither MultiSegmentReader nor SegmentReader is public. > As for the actual implementation of MergePolicy, I haven't prototyped that out > yet. Right now in KS, the infrastructure is reasonably primitive: > IndexManager has a method called SegReaders_To_Merge() which accepts a > PolyReader as an argument and returns an array of SegReaders representing > content that should be merged. KS does the fibonacci merge policy right? >> Even though Lucy's SegmentReader is lighter weight, it still seems >> like you shouldn't be opening them in the writer (except for realtime >> search)? > > I don't see why not. But it still ties up resources? EG mmap uses up chunks of your address space (possibly important on 32 bit machines, eg if you want a large ram buffer in the writer), opening files takes time & descriptors, etc. > When I built this into KS, I thought I was imitating your plan for Lucene. :) I think for the time being we'll still allow "pure IndexWriter". >> Are you going to simply make the segment metadata public? > > If you're referring to Segment's Fetch_Metadata method, it's public. If it > weren't, then plugin components couldn't use it, which would be unfortunate. > I think we ought to make it easy for plugins to store their metadata as JSON > inside segmeta.json file rather than resort to writing it in their own > proprietary formats. Yes. > In theory, any index component can peer into another's metadata, just by > invoking Seg_Fetch_Metadata(segment, other_component_name). That would be a > bad idea, though, because what a component might choose to store there isn't > public. I think it's fine to not guard against it. It's the same "we are all consenting adults" approach that Python takes. > I do expect that we will codify what data will be present in parts of > segmeta.json as part of the official Lucy file format spec, however. If you > were both dumb and determined, you could duplicate all the version checking > code and adhere to the spec, making it possible to (maybe) safely interpret > that data. OK Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org