On Aug 5, 2009, at 3:27 PM, Michael Busch wrote:

I think we're not quite there yet, where you can create a custom indexing format easily and store your custom Attributes. What's especially missing is an API on the retrieval side, on a TermDocs/ TermPositions level, that can read custom indexing formats.

Right, I was thinking a good first start would be the AttributeTermQuery, which would be similar to a BoostingTermQuery, but would be Attribute aware.

We also need to make the SegmentMerger more flexible, so that it can merge the custom codecs.

Hmmm, OK.

Random, somewhat uneducated thought: Why not just define the codecs to create byte arrays? Then we can use the existing payload capability much like I do with the DelimitedPayloadTokenFilter. We'd probably have to make sure this still worked with Similarity, but it seems like it could. Thinking on this some more, seems like this could work already with a a AttributePayloadEncoder or something like an AttributeToPayloadTokenFilter (I know, horrible name). Then, on the Query side, the AttributeTermQuery is just a glorified BoostingTermQuery with some callback hooks for dealing with the Attribute (but maybe that isn't even needed), either that or we just provide helper methods to the Similarity class so that people can easily decode the byte array into an Attribute. In fact, maybe all that needs to happen is the Attributes need to define encode/decode methods that (de)serialize a byte array.

Seems like this approach would require very little in the way of changes to Lucene, but I admit it isn't fully baked in my mind just yet. It also has the nice benefit that all the work we did on Payloads isn't wasted.

This is resonating more and more with me.  What do you think?


LUCENE-1458 goes into that direction, but obviously it's not going into 2.9/3.0. So hopefully in 3.1 we'll be able to do the stuff you're asking for.

Even though we don't have flexible indexing yet, I think we made great progress towards it already. Especially with Mike's restructuring of the indexer and the Attributes API.

Agreed.


We should probably work on the wiki page that we have for flexible indexing and identify work items, so that it's easy to see what the current state is and what the plans are (as it was recently suggested on java-dev). Let's do that after 2.9 is out.

Agreed, unless of course the above works, in which case we may just be able to crank it out in a day or two.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to