On Aug 5, 2009, at 3:27 PM, Michael Busch wrote:
I think we're not quite there yet, where you can create a custom
indexing format easily and store your custom Attributes. What's
especially missing is an API on the retrieval side, on a TermDocs/
TermPositions level, that can read custom indexing formats.
Right, I was thinking a good first start would be the
AttributeTermQuery, which would be similar to a BoostingTermQuery, but
would be Attribute aware.
We also need to make the SegmentMerger more flexible, so that it can
merge the custom codecs.
Hmmm, OK.
Random, somewhat uneducated thought: Why not just define the codecs
to create byte arrays? Then we can use the existing payload
capability much like I do with the DelimitedPayloadTokenFilter. We'd
probably have to make sure this still worked with Similarity, but it
seems like it could. Thinking on this some more, seems like this
could work already with a a AttributePayloadEncoder or something like
an AttributeToPayloadTokenFilter (I know, horrible name). Then, on
the Query side, the AttributeTermQuery is just a glorified
BoostingTermQuery with some callback hooks for dealing with the
Attribute (but maybe that isn't even needed), either that or we just
provide helper methods to the Similarity class so that people can
easily decode the byte array into an Attribute. In fact, maybe all
that needs to happen is the Attributes need to define encode/decode
methods that (de)serialize a byte array.
Seems like this approach would require very little in the way of
changes to Lucene, but I admit it isn't fully baked in my mind just
yet. It also has the nice benefit that all the work we did on
Payloads isn't wasted.
This is resonating more and more with me. What do you think?
LUCENE-1458 goes into that direction, but obviously it's not going
into 2.9/3.0. So hopefully in 3.1 we'll be able to do the stuff
you're asking for.
Even though we don't have flexible indexing yet, I think we made
great progress towards it already. Especially with Mike's
restructuring of the indexer and the Attributes API.
Agreed.
We should probably work on the wiki page that we have for flexible
indexing and identify work items, so that it's easy to see what the
current state is and what the plans are (as it was recently
suggested on java-dev). Let's do that after 2.9 is out.
Agreed, unless of course the above works, in which case we may just be
able to crank it out in a day or two.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org