[ 
https://issues.apache.org/jira/browse/LUCENE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500226#comment-13500226
 ] 

Tim Smith commented on LUCENE-4560:
-----------------------------------

The "gradual" approach is very much required.
Its possible that a config change by a user will result in the need to do a 
filtered reader on a merge.

For instance, if you index a field without offsets, then you shutdown, start up 
with indexing of offsets.
Currently, this situation will result in newly indexed offsets being 
obliterated on merge (LUCENE-4557) with no possible way to save them.

Especially in this case, the addIndexes() approach is way too costly just for a 
small configuration change.
Small config changes shouldn't require the equivalent of a full optimize to 
take effect.


Also, i argue that any addIndexes() approach is even more dangerous and just as 
prone to corruption.
This can result in the same filtering of readers as the attached patch 
provides, however it modifies the entire index, thereby causing any corruption 
to be much more widespread. (of course either way, it is up to the person 
implementing their custom filter to guarantee that no corruption occurs and 
that their code produces consistent indexes)


I will look into the MergePolicy approach.
Off hand, it looks like this may still require a patch as the SegmentMerger is 
currently only aware of SegmentReaders from merging,
however i may be able to add my own SegmentInfo's to the OneMerge replacing the 
codec with a wrapped codec that will apply my filtering.
it'll be about a week before i can get back to testing this, i'll report back 
then.




                
> Support Filtering Segments During Merge
> ---------------------------------------
>
>                 Key: LUCENE-4560
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4560
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Tim Smith
>         Attachments: LUCENE-4560.patch
>
>
> Spun off from LUCENE-4557
> It is desirable to be able to filter segments during merge.
> Most often, full reindex of content is not possible.
> Merging segments can sometimes have negative consequences when fields are 
> have different options (most restrictive option is forced during merge)
> Being able to filter segments during merges will allow gradually migrating 
> indexed data to new index settings, support pruning/enhancing existing data 
> gradually
> Use Cases:
> * Migrate IndexOptions for fields (See LUCENE-4557)
> * Gradually Remove index fields no longer used
> * Migrate indexed sort fields to DocValues
> * Support converting data types for indexed data
> * and so on
> patch will be forthcoming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to