Hmm... on option 1, how would you run into merges of target segments? I think we currently do one big merge of the source segments, into one segment in the target index?
But, the issue on option 2 is truly annoying. We have the same problem for apps that want to "ugprade" their index from 3.x to the 4.0 format (for example). Maybe we need a new (expert) method... remergeIndex? mergeAllSegments? rebuildIndex? Mike http://blog.mikemccandless.com On Wed, Apr 13, 2011 at 1:43 PM, Shai Erera <ser...@gmail.com> wrote: > Hey, > > In Lucene 3.1 we've introduced PayloadProcessorProvider which allows you to > rewrite payloads of terms during merge. The main scenario is when you merge > indexes, and you want to rewrite/remap payloads of the incoming indexes, but > one can certainly use it to rewrite the payloads of a term, in a given > index. > When we worked on it, we thought of two ways the user can rewrite payloads > when he merges indexes: > > 1) Set PPP on the target IW, call addIndexes(IndexReader), while PPP will be > applied on the incoming directories only. > 2) Set PPP on the source IW, call IW.optimize(), then use > targetIW.addIndexes(Directory). > > The latter is better since in both cases the incoming segments are rewritten > anyway, however in the first case you might run into merging segments of the > target index as well, something you might want to avoid (that was the > purpose of optimizing addIndexes(Directory)). > > But it turns out the latter is not so easy to achieve. If the source index > has only 1 segment (at least in my case, ~100% of the time), then calling > optimize() doesn't do anything because the MP thinks the index is already > optimized and returns no MergeSpec. To overcome this, I wrote a > ForceOptimizeMP which extends LogMP and forces optimize even if there is > only one segment. > > Another option is to set the noCFSRation to 1.0 and flip the useCompoundFile > flag (ie if source is compound, create no compound and vice versa). That can > work too, but I don't think it's very good, because the source index will be > changed from compound to non (or vice versa), which is something that the > app didn't want. > > So I think option 1 is better, but I wanted to ask if someone knows of a > better way to achieve this? > > Shai --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org