[ https://issues.apache.org/jira/browse/LUCENE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865113#action_12865113 ]
Shai Erera commented on LUCENE-1585: ------------------------------------ I hate it when it happens, but better sooner than later - I realized the API must take into account the current Term. We cannot process all the payloads in the index the same way. So how about the following: * PayloadProcessorProvider will accept both a Directory and a Term, and will return a suitable PayloadProcessor for that Directory, and if needed, for the Directory+Term combination. * PayloadProcessor will continue to work as is and will expose the same API - a payload is still a payload. Its the responsibility of PPP to return the right PP instance for the given Dir+Term It does not make sense that the payloads of all the terms in the incoming indexes will need to be processed. Specifically, the scenario I have at hand needs to rewrite payloads of certain postings only, but the index contains payloads in other postings as well. For 3x that's easy - SMI holds the current Term that is processed. But I don't see an equivalent in trunk, in PostingsConsumer. It receives a DocsEnum which does not tell you the term it works on, and MergeState which includes just FieldInfo, which can tell you the field name? Any ideas how I can get the Term this posting belongs to? (I know there is no Term, but field + BytesRef will do). Mike - I'll add PP as a required arg to SM, np. I was only suggesting to pass IW so that we can avoid changing it in the future, but explicit args are fine by me. > Allow to control how payloads are merged > ---------------------------------------- > > Key: LUCENE-1585 > URL: https://issues.apache.org/jira/browse/LUCENE-1585 > Project: Lucene - Java > Issue Type: New Feature > Components: Index > Reporter: Michael Busch > Assignee: Shai Erera > Priority: Minor > Fix For: 3.1, 4.0 > > Attachments: LUCENE-1585_3x.patch, LUCENE-1585_3x.patch, > LUCENE-1585_trunk.patch > > > Lucene handles backwards-compatibility of its data structures by > converting them from the old into the new formats during segment > merging. > Payloads are simply byte arrays in which users can store arbitrary > data. Applications that use payloads might want to convert the format > of their payloads in a similar fashion. Otherwise it's not easily > possible to ever change the encoding of a payload without reindexing. > So I propose to introduce a PayloadMerger class that the SegmentMerger > invokes to merge the payloads from multiple segments. Users can then > implement their own PayloadMerger to convert payloads from an old into > a new format. > In the future we need this kind of flexibility also for column-stride > fields (LUCENE-1231) and flexible indexing codecs. > In addition to that it would be nice if users could store version > information in the segments file. E.g. they could store "in segment _2 > the term a:b uses payloads of format x.y". -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org