Ewocker opened a new pull request, #538: URL: https://github.com/apache/jackrabbit-oak/pull/538
## Problem The current way of dealing with download retry is to build a new FlatFileStore folder each time a retry happens. Merge daemon thread creates a merge folder under that FlatFileStore folder and dump intermediate merge progress there. When there's a download retry, the retried process load all the previously created FlatFileStore into its progress, however it does not take in the already merged intermediate files. The unit test does not run into this issue because only a small number of intermediate batch merged files are generated and batch merge size by default is 64 files. So intermediate merge does not happen. ## Reproduction When running in Leroy Merlin clones, it is expected to generate a `store-sorted.json.gz` with around 27G size, but when download retires happens a 22G file was generated. Further more, the algorithm deletes all files after merging, but intermediate files are left when exec into the pod and inspect the issue. ``` root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem# ls DataStore.config checkpoint.txt indexing-result indexingConsole.log oak-run.jar scripts temp root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem# cd temp root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem/temp# ls cache flat-fs-4277684662059617546 flat-fs-5727418750118939334 flat-fs-6657900471407058573 indexing.log logback-indexing.xml root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem/temp# du -ah -d1 | grep sorted-file root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem/temp# du -ah -d1 | grep sorted.json root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem/temp# du -ah -d1 22G ./flat-fs-6657900471407058573 2.7G ./flat-fs-5727418750118939334 96M ./indexing.log 624M ./flat-fs-4277684662059617546 8.0K ./logback-indexing.xml 1.1M ./cache 25G . root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem/temp# du -ah -d2 | grep sorted.json 22G ./flat-fs-6657900471407058573/store-sorted.json.gz root@cm-p28202-e44634-ijob-benchmarkrunner-oak-indexing-test-vnt2g:/opt/aem/temp# du -ah -d3 | grep merge 8.0K ./flat-fs-6657900471407058573/merge 807M ./flat-fs-5727418750118939334/merge/intermediate-62 4.4M ./flat-fs-5727418750118939334/merge/intermediate-9 24M ./flat-fs-5727418750118939334/merge/intermediate-40 7.9M ./flat-fs-5727418750118939334/merge/intermediate-8 3.3M ./flat-fs-5727418750118939334/merge/intermediate-1 5.2M ./flat-fs-5727418750118939334/merge/intermediate-16 ... ``` ## Fix Implementation Adding ability to forcefully stop merge process, and in retry load the previous merged files into sorted files list to be merged later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
