Ewocker opened a new pull request, #538:
URL: https://github.com/apache/jackrabbit-oak/pull/538

   ## Problem
   The current way of dealing with download retry is to build a new 
FlatFileStore folder each time a retry happens. Merge daemon thread creates a 
merge folder under that FlatFileStore folder and dump intermediate merge 
progress there.
   When there's a download retry, the retried process load all the previously 
created FlatFileStore into its progress, however it does not take in the 
already merged intermediate files.
   
   The unit test does not run into this issue because only a small number of 
intermediate batch merged files are generated and batch merge size by default 
is 64 files. So intermediate merge does not happen.
   
   ## Reproduction
   When running in Leroy Merlin clones, it is expected to generate a 
`store-sorted.json.gz`  with around 27G size, but when download retires happens 
a 22G file was generated. Further more, the algorithm deletes all files after 
merging, but intermediate files are left when exec into the pod and inspect the 
issue.
   
   ```
   root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem# 
ls
   DataStore.config  checkpoint.txt  indexing-result  indexingConsole.log  
oak-run.jar  scripts  temp
   root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem# 
cd temp
   
root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem/temp#
 ls
   cache  flat-fs-4277684662059617546  flat-fs-5727418750118939334  
flat-fs-6657900471407058573  indexing.log  logback-indexing.xml
   
root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem/temp#
 du -ah -d1 | grep sorted-file
   
root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem/temp#
 du -ah -d1 | grep sorted.json
   
root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem/temp#
 du -ah -d1
   22G    ./flat-fs-6657900471407058573
   2.7G    ./flat-fs-5727418750118939334
   96M    ./indexing.log
   624M    ./flat-fs-4277684662059617546
   8.0K    ./logback-indexing.xml
   1.1M    ./cache
   25G    .
   
root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem/temp#
 du -ah -d2 | grep sorted.json
   22G    ./flat-fs-6657900471407058573/store-sorted.json.gz
   
root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem/temp#
 du -ah -d3 | grep merge
   8.0K    ./flat-fs-6657900471407058573/merge
   807M    ./flat-fs-5727418750118939334/merge/intermediate-62
   4.4M    ./flat-fs-5727418750118939334/merge/intermediate-9
   24M    ./flat-fs-5727418750118939334/merge/intermediate-40
   7.9M    ./flat-fs-5727418750118939334/merge/intermediate-8
   3.3M    ./flat-fs-5727418750118939334/merge/intermediate-1
   5.2M    ./flat-fs-5727418750118939334/merge/intermediate-16
   ...
   ```
   
   ## Fix Implementation
   Adding ability to forcefully stop merge process, and in retry load the 
previous merged files into sorted files list to be merged later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to