Re: Attributes, DocConsumer, Flexible Indexing, etc.

Grant Ingersoll Wed, 05 Aug 2009 13:08:08 -0700


On Aug 5, 2009, at 3:27 PM, Michael Busch wrote:

I think we're not quite there yet, where you can create a customindexing format easily and store your custom Attributes. What'sespecially missing is an API on the retrieval side, on a TermDocs/TermPositions level, that can read custom indexing formats.

Right, I was thinking a good first start would be theAttributeTermQuery, which would be similar to a BoostingTermQuery, butwould be Attribute aware.

We also need to make the SegmentMerger more flexible, so that it canmerge the custom codecs.


Hmmm, OK.

Random, somewhat uneducated thought: Why not just define the codecsto create byte arrays? Then we can use the existing payloadcapability much like I do with the DelimitedPayloadTokenFilter. We'dprobably have to make sure this still worked with Similarity, but itseems like it could. Thinking on this some more, seems like thiscould work already with a a AttributePayloadEncoder or something likean AttributeToPayloadTokenFilter (I know, horrible name). Then, onthe Query side, the AttributeTermQuery is just a glorifiedBoostingTermQuery with some callback hooks for dealing with theAttribute (but maybe that isn't even needed), either that or we justprovide helper methods to the Similarity class so that people caneasily decode the byte array into an Attribute. In fact, maybe allthat needs to happen is the Attributes need to define encode/decodemethods that (de)serialize a byte array.

Seems like this approach would require very little in the way ofchanges to Lucene, but I admit it isn't fully baked in my mind justyet. It also has the nice benefit that all the work we did onPayloads isn't wasted.


This is resonating more and more with me.  What do you think?

LUCENE-1458 goes into that direction, but obviously it's not goinginto 2.9/3.0. So hopefully in 3.1 we'll be able to do the stuffyou're asking for.
Even though we don't have flexible indexing yet, I think we madegreat progress towards it already. Especially with Mike'srestructuring of the indexer and the Attributes API.


Agreed.

We should probably work on the wiki page that we have for flexibleindexing and identify work items, so that it's easy to see what thecurrent state is and what the plans are (as it was recentlysuggested on java-dev). Let's do that after 2.9 is out.

Agreed, unless of course the above works, in which case we may just beable to crank it out in a day or two.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Attributes, DocConsumer, Flexible Indexing, etc.

Reply via email to