stefan-egli opened a new pull request #260:
URL: https://github.com/apache/jackrabbit-oak/pull/260


   2nd iteration of OAK-9149 building on
   * https://github.com/apache/jackrabbit-oak/pull/243
   * and https://github.com/apache/jackrabbit-oak/pull/244
   
   which was redone based on the following review finding:
   * 
https://issues.apache.org/jira/browse/OAK-9149?focusedCommentId=17170693&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17170693
   
   This version now uses a _phased_ and _batched_ backgroundSplit.
   * _phased_ : it combines all updateOp except the one on the main document 
into the first phase. These updateOps can be executed in 1 batch and a partial 
execution in case of an error/exception doesn't do any harm. Each individual 
updateOP will be redone if it needs to. The more important part is that the 
update on the main document is done in a second step/phase. That is therefore 
what's now implemented. If this 2nd phase is executed only partially, then 
again that's no problem as that just means that for some the split didn't 
finish properly and needs to be redone.
   ** Note that if one of the phases is only executed partially, while it is 
fine from a consistency point of view, it still has the potential to leave 
garbage. A comment about this is added in a `// TODO` in `backgroundSplit()`.
   * _batched_ : as mentioned above, all phase 1 updateOps are combined up to 
the configured batch size (`oak.documentMK.createOrUpdateBatchSize`, default 
1000) and executed in one go towards the DocumentStore (ie using the bulk 
version of the 'createOrUpdate' command).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to