Thank you very much Michael for the information! -John
On Fri, Sep 18, 2009 at 6:01 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > Say you have a type of field with fixed length data per doc, e.g. a > > 8 bytes. > > OK this makes sense -- thanks for the example! This sounds like > getting column-stride-fields before that feature is added to Lucene > "for real". > > For flushing, you can plugin your own indexing chain to IndexWriter. > This (customizing what's indexed per-doc and what's written for the > new segment) is exactly what the pluggable indexing chain is for. > BUT: this API is still very experimental and package private. > > I suppose, for looser integration we could add a hook that's called in > IndexWriter giving you a chance to do something at flush. > Hmm... actually could you use doAfterFlush()? > > Merging, however, doesn't yet have hooks / pluggability in place to do > something custom, and I agree it's sorely needed. Patches very > welcome here! > > This could enable "loose" customization on what's flushed and how it's > merged, and you'd have to make your own reader external to Lucene. > > LUCENE-1458 is aiming to cover this sort of use case, but in a more > tightly integrated way. EG the new enumeration API in LUCENE-1458 (to > replace TermEnum, TermDocs, TermPositions) is based on AttributeSource > so that you could add your own attribute at the field, term, doc or > positions level. However I haven't explored this at all yet, and eg > customizable merging is not done. > > > It [flush] probably doesn't need to be final Mike? > > I agree. Wanna include un-final'ing it in a patch? > > > Is there a wiki or some sort of write up on LUCENE-1458? > > Sorry not just yet. I agree it's badly needed... it's an enormous set > of changes at this point. I'll add a wiki page that I'll try to keep > current as the design iterates. > > Mike > > On Thu, Sep 17, 2009 at 8:14 PM, John Wang <john.w...@gmail.com> wrote: > > Sure. > > > > A simple example: > > > > Say you have a type of field with fixed length data per doc, e.g. a 8 > bytes. > > It might be good to store in a segment: > > <numdocs><v1><v2>....<vn> > > > > so if you have 1000 docs, your seg file is 8k+4 bytes. > > > > Merging would be rather trivial as well. > > > > Doing this right now involves storing into payload, which pays a cost of > > parsing byte[] to say a long per doc. > > > > I think this problem is orthogonal to 1458. > > > > There are other usecases, so I thought it might be a good idea to > abstract > > it out, since on a high level it is rather similar: > > > > start > > write per doc > > end > > merge > > > > Hopefully I am describing it clearly. > > > > Thanks > > > > -John > > > > > > On Thu, Sep 17, 2009 at 10:35 PM, Michael McCandless > > <luc...@mikemccandless.com> wrote: > >> > >> I'm actively working on LUCENE-1458, to enable differenct codecs for > >> reading/writing the terms dict and doc/freq/prox/payload postings. > >> I'm working now towards getting PforDelta working... > >> > >> However, that change doesn't [yet] do anything for norms, stored > >> fields nor term vectors. > >> > >> Can you describe more details about what kinds of customization you're > >> looking to do? > >> > >> Mike > >> > >> On Thu, Sep 17, 2009 at 10:00 AM, John Wang <john.w...@gmail.com> > wrote: > >> > Hi guys: > >> > > >> > I am trying to figure how to add the ability to create custom > >> > segment > >> > files. Hopefully it is possible to create a plugin framework where one > >> > can > >> > provide some sort of callback to add to a segment given a doc and > >> > provide > >> > some sort of merge logic. This is in light of the flexible indexing > >> > effort. > >> > > >> > After digging thru the latest trunk code in that area, I see a > >> > Writer/WriterPerThread pattern for different types of segment files, > >> > e.g. > >> > Stored data, norms, inverted doc, etc. > >> > > >> > Do you think it is a good idea to consolidate them? Are there > >> > intricacies where there are cross dependency between different types > of > >> > writers? > >> > > >> > Merge logic seems to be in the SegmentMerger class. Seems to do > >> > this, > >> > it would be good to separate it out to per writer type. > >> > > >> > I am still trying to understand the code, any help is greatly > >> > appreciated. > >> > > >> > Thoughts? > >> > > >> > Thanks > >> > > >> > -John > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > >> > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >