Re: Pluggable IndexReader (was 2.9/3.0 plan & Java 1.5)

Michael McCandless Mon, 15 Dec 2008 04:04:48 -0800


Marvin Humphrey wrote:

I have a bunch of file format changes to push through, and I'mhoping toimplement them using pluggable modules. For instance, I'd like tobe able toswap out bit-vector-based deletions for tombstone-based deletions,just by
overriding a method or two.

I think Lucene should also aim for this (swappability of index codecs)--

LUCENE-1458 is a step towards that specifically for postings.  The
tombstone approach for deletions sounds compelling too, though first

we need to fix the API to switch to iterator only and stop callingisDeleted

in document(docID).

PFOR, pulsing are other recent examples where if we had swappability,
people could more easily explore.

Jason Rutherglen:
Decoupling IndexReader would for 3.0 would be great. This includesmaking
public SegmentReader, MultiSegmentReader.
I definitely think that IndexReader can and should be made morepluggable. Isexposing per-segment sub-readers a definite win, though? Does itmake senseto leave open the door to index components which don't operate onsegments?
Or even to eliminate SegmentReader entirely and have sub-components of
IndexReader manage collation?
I've been thinking about this with regard to tombstone-baseddeletions, whereyou can't know everything about a segment unless you've opened upother
segments.


These are good points: it may be exposing too much if we fully expose
SegmentReader now, since some components (deletion tombstones) may
want to skip that API and operate directly on lower level files.
Though, with LUCENE-1483 we are moving to excuting scoring &
collection per-segment.

A constructor like new SegmentReader(TermsDictionary termDictionary,
TermPostings termPostings, ColumnStrideFields csd, DocIdBitSetdeletedDocs);
You end up with a proliferation of constructors that way. Termvectors?
Arbitrary auxiliary components such as an R-tree component supporting
geographic search?
My original proposal to clean this up involved an "IndexComponent"class.However, when I started implementing it, I ended up with a slew ofnew classes
with only two factory methods each.
We could possibly move those factory methods up into Schema, but I'mreluctant todirty it up, since it's a major public class in KS (as I anticipateit will be
in Lucy) and major public classes should be as simple as possible.

So, how about an IndexArchitecture or IndexPlan class?

 class MyArchitecture extends IndexArchitecture {
   public PostingsWriter PostingsWriter() {
     return new PForDeltaPostingsWriter();
   }
   public PostingsReader PostingsReader() {
     return new PForDeltaPostingsReader();
   }
   public DeletionsWriter DeletionsWriter() {
     return new TombstoneWriter();
   }
   public DeletionsReader DeletionsReader() {
     return new TombstoneReader();
   }
 }

Lucene:

 IndexWriter writer = new IndexWriter("/path/to/index",
   new StandardAnalyzer(), new MyArchitecture());

Lucy with Java bindings:

 class MySchema extends Schema {
   public MySchema() {
     initField("title", "text");
     initField("content", "text");
   }
   public IndexArchitecture indexArchitecture() {
     return new MyArchitecture();
   }
   public Analyzer analyzer() {
     return new PolyAnalyzer("en");
   }
 }
IndexWriter writer = new IndexWriter(MySchema.open("/path/to/index"));


I think this is a reasonable approach.  I might name it IndexCodec(s)
though, and I agree conceptually it's orthogonal to a "schema".

Decouple rollback, commit, IndexDeletionPolicy fromDirectoryIndexReaderinto a class like SegmentsVersionSystem which could act as thecontrollerfor reopen types of methods. There could be a SegmentVersionSystemthat
manages the versioning of a single segment.
I like it. :)
Sometimes you want to change up the merge policy for differentwriters against
the same index.  How does that fit into your plan?
My thought is that merge-policies would be application-specificrather than
index-specific.


This one I'm a little hazy on.  It would be nice to have a single
source for IndexWriter & IndexReader-acting-as-writer to share this
logic, but then we are [very, very slowly] migrating towards
IndexWriter being the only thing that writes to an index so it seems
like eventually it's OK if this logic is managed via the IndexWriter.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Pluggable IndexReader (was 2.9/3.0 plan & Java 1.5)

Reply via email to