[ 
https://issues.apache.org/jira/browse/LUCENE-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497208#comment-13497208
 ] 

Tim Smith commented on LUCENE-4557:
-----------------------------------

i understand the similarity to the omitTF case, but would argue that too is a 
bug

the main issue here is with merging

merging currently seems to choose the most restrictive case for IndexOptions 
for a field instead of the most general 

when you are writing new segments and you provide contradictory IndexOptions 
for the same field, it is ok for the writer to produce new segments with the 
most restrictive set (or throw an exception at this point), i have no argument 
there

however, when it comes to merging existing segments, no indexed data should be 
lost (as in this case)

if you have 2 segments with the following:
Segment 1: docs and freqs and positions
Segment 2: docs and freqs and positions and offsets

the merged segment should have the following
Merged: docs and freqs and positions and offsets

the offsets for docs that were part of segment 1 should be null/(start=0, 
end=0), or better yet (-1, -1) if possible
the offsets for docs that were part of segment 2 should be the proper offsets 
that were indexed for segment 2 in the first place

The same rule could also be applied to the omit tf case:
Segment 1: Docs Only
Segment 2: Docs And Freqs And Positions

Merged: docs and freqs and positions
docs from segment 1 should have frequency 1 and a single position of 0












                
> Indexed Offsets Can Be Lost During Merge
> ----------------------------------------
>
>                 Key: LUCENE-4557
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4557
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Tim Smith
>         Attachments: OffsetsTest.java
>
>
> Primary Use case:
> Start with pre-4.0 index (no indexed offsets available)
> Start indexing new documents with indexed offsets 
> (IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS, previously was 
> IndexOptions.DOCS_AND_FREQS_AND_POSITIONS)
> merge/optimize index
> newly indexed documents will now no longer have offsets available
> In general, it is impossible to ever change a field to have offsets indexed 
> when starting with an existing index as a merge will cause offsets to be 
> removed from the index.
> Desirable behavior would be for new documents to have offsets indexed 
> properly, and old documents would have offset of "0, 0" for all positions 
> after merging with a segment that contains offsets
> Current behavior can be very dangerous.
> for example:
> * Start indexing documents with indexed offsets
> * change config to not index offsets by accident
> * index 1 document
> * revert config back
> * offsets will start disappearing from documents as segments are merged

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to