[ 
https://issues.apache.org/jira/browse/LUCENE-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497354#comment-13497354
 ] 

Tim Smith commented on LUCENE-4557:
-----------------------------------

i know you aren't changing your mind

i also disagree with calling this "fake" data
the data would be 100% representative of what was indexed

what i would at least like to see is a reasonable means to support this 
functionality.

I propose some means to support more pluggable segment merging:

for instance, if IndexWriter had the following method:
{code}
public AtomicReader getSegmentForMerge(SegmentReader reader) {
  return reader; // default implementation does nothing.
{code}

then i could override this method, wrap reader and enhance its indexed content 
as it is merging in order to fulfill my requirements.

This would have additional benefits including but not limited to:
* Supporting migration of IndexOptions on fields
* Supporting migration of sort fields from indexed fields to DocValues
* Support converting data types for DocValues
* and so on

This wrapping would just need to be smart (a good MergeSegmentReader base class 
that SegmentMerger is integrated with) in order to optimize bulk merges of 
stored fields/termvectors/etc

if this is a more palatable approach for you, i can work up a patch as i find 
time













                
> Indexed Offsets Can Be Lost During Merge
> ----------------------------------------
>
>                 Key: LUCENE-4557
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4557
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Tim Smith
>         Attachments: OffsetsTest.java
>
>
> Primary Use case:
> Start with pre-4.0 index (no indexed offsets available)
> Start indexing new documents with indexed offsets 
> (IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS, previously was 
> IndexOptions.DOCS_AND_FREQS_AND_POSITIONS)
> merge/optimize index
> newly indexed documents will now no longer have offsets available
> In general, it is impossible to ever change a field to have offsets indexed 
> when starting with an existing index as a merge will cause offsets to be 
> removed from the index.
> Desirable behavior would be for new documents to have offsets indexed 
> properly, and old documents would have offset of "0, 0" for all positions 
> after merging with a segment that contains offsets
> Current behavior can be very dangerous.
> for example:
> * Start indexing documents with indexed offsets
> * change config to not index offsets by accident
> * index 1 document
> * revert config back
> * offsets will start disappearing from documents as segments are merged

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to