Doug Cutting wrote:
Michael,

This sounds like very good work. The back-compatibility of this approach is great. But we should also consider this in the broader context of index-format flexibility.

Three general approaches have been proposed.  They are not exclusive.

1. Make the index format extensible by adding user-implementable reader and writer interfaces for postings.

2. Add a richer set of standard index formats, including things like compressed fields, no-positions, per-position weights, etc.

3. Provide hooks for including arbitrary binary data.

Your proposal is of type (3). LUCENE-662 is a (1). Approaches of type (2) are most friendly to non-Java implementations, since the semantics of each variation are well-defined.

I don't see a reason not to pursue all three, but in a coordinated manner. In particular, we don't want to add a feature of type (3) that would make it harder to add type (1) APIs. It would thus be best if we had a rough specification of type (1) and type (2). A proposal of type (2) is at:

http://wiki.apache.org/jakarta-lucene/FlexibleIndexing

But I'm not sure that we yet have any proposed designs for an extensible posting API. (Is anyone aware of one?) This payload proposal can probably be easily incorporated into such a design, but I would have more confidence if we had one. I guess I should attempt one!


Doug,

thanks for your detailed response. I'm aware that the long-term goal is the flexible index format and I see the payloads patch only as a part of it. The patch focuses on extending the index data structures and about a possible payload encoding. It doesn't focus yet on a flexible API, it only offers the two mentioned low-level methods to add and retrieve byte arrays.

I would love to work with you guys on the flexible index format and to combine my patch with your suggestions and the patch from Nicolas! I will look at your proposal and Nicolas' patch tomorrow (have to go now). I just attached my patch (LUCENE-755), so if you get a chance you could take a look at it.

Maybe it would make sense now to follow your suggestion you made earlier this year and start a new package to work on the new index format? On the other hand, if people would like to use the payloads soon I guess due to the backwards compatibility it would be low risk to add it to the current index format to provide this feature until we can finish the flexible format?

- Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to