I always thought flexible indexing is not only for storing your app-specific data next to terms/docs. Something more along the lines of efficient geo search, or ability to try out various index encoding schemes without patching lucene.
In other words, this is something that can be a basis for easy/pluggable implementation of payload-type functionality, not vice-versa. On Thu, Aug 6, 2009 at 01:55, Grant Ingersoll<gsing...@apache.org> wrote: > > On Aug 5, 2009, at 4:35 PM, Michael Busch wrote: > >> On 8/5/09 1:07 PM, Grant Ingersoll wrote: >>> >>> Hmmm, OK. >>> >>> Random, somewhat uneducated thought: Why not just define the codecs to >>> create byte arrays? Then we can use the existing payload capability much >>> like I do with the DelimitedPayloadTokenFilter. We'd probably have to make >>> sure this still worked with Similarity, but it seems like it could. >>> Thinking on this some more, seems like this could work already with a a >>> AttributePayloadEncoder or something like an AttributeToPayloadTokenFilter >>> (I know, horrible name). Then, on the Query side, the AttributeTermQuery is >>> just a glorified BoostingTermQuery with some callback hooks for dealing with >>> the Attribute (but maybe that isn't even needed), either that or we just >>> provide helper methods to the Similarity class so that people can easily >>> decode the byte array into an Attribute. In fact, maybe all that needs to >>> happen is the Attributes need to define encode/decode methods that >>> (de)serialize a byte array. >>> >>> Seems like this approach would require very little in the way of changes >>> to Lucene, but I admit it isn't fully baked in my mind just yet. It also >>> has the nice benefit that all the work we did on Payloads isn't wasted. >>> >>> This is resonating more and more with me. What do you think? >>> >> >> Well I think this would be a nice way of using the payloads better. >> >> However, the idea behind flexible indexing is that you can customize the >> on-disk encoding in a way that it is as efficient as it can be for your >> particular use case. E.g. for payloads we currently have to encode the >> length. An application might not have to do that if it knows exactly what is >> stored. >> Then there's only the Payload API that returns you a byte array. It >> basically copies the contents of the IndexInput (usually a >> BufferedIndexInput, which means array copy from the byte buffer to the >> payload byte array). If the application knows exactly what is stored it can >> read/decode it more efficiently. > > Yeah, but really are you saving that much? 4 bytes per token? It's not > like you are saving much in terms of seeks, since you are already there > anyway. The only downside I see is a slightly larger index. Would be > interesting to try it out and see. > > > > >> >> The latter inefficiency we could solve by improving the payloads API: it >> could return an IndexInput instead of the byte array and the caller could >> consume it more efficient. > > This is also interesting, but again requires some changes. With what I'm > proposing, I think it could be done very simply w/o any API changes, and we > just need to expose some of the IndexInput/Output helper classes a bit more > to make it easier for people to encode/decode their stuff. Then, just > documentation and some more Boosting*Query (Peter has already done > BoostingNearQuery) and I think you have a pretty good flexible indexing AND > searching capability all in a back compatible way using our existing code. > >> >> So I agree that we could use Attributes to make the payloads feature >> better usable, but I don't think it will be a replacement for flexible >> indexing. > > > >> >> Michael >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org