[ 
https://issues.apache.org/jira/browse/LUCENE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499871#comment-13499871
 ] 

Shai Erera commented on LUCENE-4560:
------------------------------------

Hi Tim. While in general I'm not against the idea (and I think that in general 
have some more control during the merge stage is needed), may I ask why can't 
you e.g. do this code (borrowing from your patch):

{code}
IndexWriter writer = new IndexWriter(newDirectory);
writer.addIndexes(new RemoveFieldReader(oldReader));
{code}

That will accomplish, I believe, exactly what you want, no?

The benefits to your approach is that the filtering is done in-place, i.e. no 
need to add to a new directory, then switch old/new dirs. But it also may 
inadvertently add bugs, e.g. if someone mistakenly decided to remove a field, 
or worse, removes the wrong field ... w/ the addIndexes approach, you can do 
the process offline, investigate the result index and once you're contend with 
it, make the switch.

I can see the benefits in both approaches, but I think that the addIndexes 
approach is safer, as it's not 'online' and does not change the source 
directory. I'm not sure how 'online' this process needs to be though. How often 
do you remove fields, or change index options? That's a fairly serious decision 
IMO, and should be done w/ care and lots of testing. Doing that in-place may be 
dangerous.

About the patch, it's very simple and clean, which is a good thing ! I would 
make RemoveFieldReader extend FilterAtomicReader, to save you some lines of 
code, even though it's just a test class.

If you do (and others agree) want to continue w/ the online filtering approach, 
perhaps, instead of introducing a MergedSegmentFilter, we could make 
SegmentMerger pluggable, with few extension points that allow you to allocate 
your own AtomicReader ... just a thought, I know it's not directly related to 
this issue, but if we're going to open segment merging up for some serious 
hacking, let's do it w/ all intentions :).
                
> Support Filtering Segments During Merge
> ---------------------------------------
>
>                 Key: LUCENE-4560
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4560
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Tim Smith
>         Attachments: LUCENE-4560.patch
>
>
> Spun off from LUCENE-4557
> It is desirable to be able to filter segments during merge.
> Most often, full reindex of content is not possible.
> Merging segments can sometimes have negative consequences when fields are 
> have different options (most restrictive option is forced during merge)
> Being able to filter segments during merges will allow gradually migrating 
> indexed data to new index settings, support pruning/enhancing existing data 
> gradually
> Use Cases:
> * Migrate IndexOptions for fields (See LUCENE-4557)
> * Gradually Remove index fields no longer used
> * Migrate indexed sort fields to DocValues
> * Support converting data types for indexed data
> * and so on
> patch will be forthcoming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to