On Mon, Dec 15, 2008 at 07:04:08AM -0500, Michael McCandless wrote: > These are good points: it may be exposing too much if we fully expose > SegmentReader now, since some components (deletion tombstones) may > want to skip that API and operate directly on lower level files.
After thinking things over, I no longer worry about this seeming contradiction. Even if the tombstones deletions reader, the stored fields reader, or some other component is reading files which were not written all in one batch as part of the original collection of segment files, they still relate to the same *logical* segment. We wouldn't ever limit the set of files which a SegmentReader is allowed to read from to the original segment files. Defining the collection of valid files for a given point-in-time view of the index is the role of the Snapshot in KS and the segments_NNN file in Lucene. It's up to the SegmentReader to determine which files within the snapshot it should read from. > >So, how about an IndexArchitecture or IndexPlan class? > > > > class MyArchitecture extends IndexArchitecture { > > public PostingsWriter PostingsWriter() { > > return new PForDeltaPostingsWriter(); > > } > > public PostingsReader PostingsReader() { > > return new PForDeltaPostingsReader(); > > } > > public DeletionsWriter DeletionsWriter() { > > return new TombstoneWriter(); > > } > > public DeletionsReader DeletionsReader() { > > return new TombstoneReader(); > > } > > } > > class MySchema extends Schema { > > public MySchema() { > > initField("title", "text"); > > initField("content", "text"); > > } > > public IndexArchitecture indexArchitecture() { > > return new MyArchitecture(); > > } > > public Analyzer analyzer() { > > return new PolyAnalyzer("en"); > > } > > } > > > > IndexWriter writer = new IndexWriter(MySchema.open("/path/to/ > >index")); > > I think this is a reasonable approach. I might name it IndexCodec(s) > though, and I agree conceptually it's orthogonal to a "schema". FWIW, I've gone forward with "Architecture". >>> Decouple rollback, commit, IndexDeletionPolicy from DirectoryIndexReader >>> into a class like SegmentsVersionSystem which could act as the controller >>> for reopen types of methods. There could be a SegmentVersionSystem that >>> manages the versioning of a single segment. >> >> I like it. :) >> >> Sometimes you want to change up the merge policy for different writers >> against the same index. How does that fit into your plan? >> >> My thought is that merge-policies would be application-specific >> rather than index-specific. > > This one I'm a little hazy on. It would be nice to have a single > source for IndexWriter & IndexReader-acting-as-writer to share this > logic, but then we are [very, very slowly] migrating towards > IndexWriter being the only thing that writes to an index so it seems > like eventually it's OK if this logic is managed via the IndexWriter. I'm thinking of calling this one "UpdatePolicy". It would collect together MergePolicy, DeletionsPolicy, LockFactory, etc -- all the app-specific behaviors related to interacting with existing data and files. A Schema.makeUpdatePolicy() factory method can serve as the single, shared source for this logic. However, the IndexWriter and IndexReader constructors would allow the default UpdatePolicy to be overridden with an argument. We end up with the following hierarchy: * Architecture: Stuff that never changes for the life of the index. Defining an Achitecture subclass is roughly analogous to choosing a storage engine in MySQL (MyISAM vs. InnoDB, etc). * Schema: Roughly analogous to an SQL table definition. * UpdatePolicy: Stuff that can change up per-index-session. Of those three classes, the only one that most users would encounter would be Schema. Architecture and UpdatePolicy would isolate power-user functionality, making it easier to grok and master basic indexing technique. Marvin Humphrey --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org