[ 
https://issues.apache.org/jira/browse/LUCENE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865487#action_12865487
 ] 

Shai Erera commented on LUCENE-1585:
------------------------------------

I've been thinking about the multi-threading issue, and as far as I understand, 
it only concerns the local segment merging? PPP works w/ Directory+Term because 
the format of the payloads is per term for the entire Directory (not per 
segment). Therefore, I don't think there is multi-threading issues with the 
external Directories (the result of addIndexe*)?

For the local segments, I see what you mean - it is possible that several 
threads will ask a PP for the same Dir+Term. PPP implementations can still work 
well in such scenario (if they wish to process payloads of local Dir as well) 
by holding a ThreadLocal PP for Dir+Term combination? I think proper 
documentation should be enough in this case. The whole point of this issue is 
to allow better control when addIndexes* are used. Affecting local payloads is 
a nice bonus, and I think we should wait for a real scenario which takes 
advantage of that. If the threading documentation warnings won't help, we can 
discuss then how to solve it?

> Allow to control how payloads are merged
> ----------------------------------------
>
>                 Key: LUCENE-1585
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1585
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-1585_3x.patch, LUCENE-1585_3x.patch, 
> LUCENE-1585_trunk.patch
>
>
> Lucene handles backwards-compatibility of its data structures by
> converting them from the old into the new formats during segment
> merging. 
> Payloads are simply byte arrays in which users can store arbitrary
> data. Applications that use payloads might want to convert the format
> of their payloads in a similar fashion. Otherwise it's not easily
> possible to ever change the encoding of a payload without reindexing.
> So I propose to introduce a PayloadMerger class that the SegmentMerger
> invokes to merge the payloads from multiple segments. Users can then
> implement their own PayloadMerger to convert payloads from an old into
> a new format.
> In the future we need this kind of flexibility also for column-stride
> fields (LUCENE-1231) and flexible indexing codecs.
> In addition to that it would be nice if users could store version
> information in the segments file. E.g. they could store "in segment _2
> the term a:b uses payloads of format x.y".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to