Re: Attributes, DocConsumer, Flexible Indexing, etc.

Michael Busch Wed, 05 Aug 2009 13:34:11 -0700

On 8/5/09 1:07 PM, Grant Ingersoll wrote:

Hmmm, OK.
Random, somewhat uneducated thought: Why not just define the codecsto create byte arrays? Then we can use the existing payloadcapability much like I do with the DelimitedPayloadTokenFilter. We'dprobably have to make sure this still worked with Similarity, but itseems like it could. Thinking on this some more, seems like thiscould work already with a a AttributePayloadEncoder or something likean AttributeToPayloadTokenFilter (I know, horrible name). Then, onthe Query side, the AttributeTermQuery is just a glorifiedBoostingTermQuery with some callback hooks for dealing with theAttribute (but maybe that isn't even needed), either that or we justprovide helper methods to the Similarity class so that people caneasily decode the byte array into an Attribute. In fact, maybe allthat needs to happen is the Attributes need to define encode/decodemethods that (de)serialize a byte array.
Seems like this approach would require very little in the way ofchanges to Lucene, but I admit it isn't fully baked in my mind justyet. It also has the nice benefit that all the work we did onPayloads isn't wasted.
This is resonating more and more with me.  What do you think?


Well I think this would be a nice way of using the payloads better.

However, the idea behind flexible indexing is that you can customize theon-disk encoding in a way that it is as efficient as it can be for yourparticular use case. E.g. for payloads we currently have to encode thelength. An application might not have to do that if it knows exactlywhat is stored.Then there's only the Payload API that returns you a byte array. Itbasically copies the contents of the IndexInput (usually aBufferedIndexInput, which means array copy from the byte buffer to thepayload byte array). If the application knows exactly what is stored itcan read/decode it more efficiently.

The latter inefficiency we could solve by improving the payloads API: itcould return an IndexInput instead of the byte array and the callercould consume it more efficient.

So I agree that we could use Attributes to make the payloads featurebetter usable, but I don't think it will be a replacement for flexibleindexing.


 Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Attributes, DocConsumer, Flexible Indexing, etc.

Reply via email to