[ 
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536999#comment-13536999
 ] 

Shai Erera commented on LUCENE-4258:
------------------------------------

bq. I am not sure if merging just updates of a segment is feasible. In what 
cases would it be better than collapsing all updates into the base segment?

Just like expungeDeletes, I think that we should have collapseFieldUpdates() 
which can be called explicitly by the app, but also IW should call 
MP.findSegmentsForFieldUpdates() (or some such name). And it should collapse 
all updates into the segment, implies rewriting that segment. If we collapse 
all updates but keep the base segment + a single stacked segment, I don't think 
that we're doing much. The purpose is to get rid of updates entirely.

Also, regarding statistics. I think that as a first step, we should not go out 
of our way to return the correct statistics. Just like the stats today do not 
account for deleted documents, so should the updates. I realize that it's not 
the same as deleted documents, but it certainly simplifies matters. Stats will 
be correct following collapseFieldUpdates or regular segment merges.

As a second step, we can try to return statistics including stacked segments 
more efficiently. I.e., if a term appears in both the base and stacked segment, 
we return the stats from base. But if it exists only in the stacked segment, we 
can return the stats from there? I'm not too worried about the stats though, 
because that's a temporary thing, which gets fixed once updates are collapsed.

And if the MergePolicy will have separate settings for collapsing field updates 
(I think it should!), then the collapsing could occur more frequently than 
regular merges (and expunging deleted documents). Also, it will give apps a way 
to control how often do they want to get accurate statistics.

Can we leave statistics outside the scope of this issue? And for now change 
CheckIndex to detect that it's a segment with field updates, and therefore 
check stats from the base segment only? I think it does something like that 
with deleted documents already, no?
                
> Incremental Field Updates through Stacked Segments
> --------------------------------------------------
>
>                 Key: LUCENE-4258
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4258
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Sivan Yogev
>         Attachments: IncrementalFieldUpdates.odp, 
> LUCENE-4258-API-changes.patch, LUCENE-4258.r1410593.patch, 
> LUCENE-4258.r1412262.patch, LUCENE-4258.r1416438.patch, 
> LUCENE-4258.r1416617.patch, LUCENE-4258.r1422495.patch, 
> LUCENE-4258.r1423010.patch
>
>   Original Estimate: 2,520h
>  Remaining Estimate: 2,520h
>
> Shai and I would like to start working on the proposal to Incremental Field 
> Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to