manishmalhotrawork commented on a change in pull request #524: respect 
commit.manifest.min.count
URL: https://github.com/apache/incubator-iceberg/pull/524#discussion_r341456261
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
 ##########
 @@ -595,6 +596,9 @@ private Evaluator 
extractInclusiveDeleteExpression(ManifestReader reader) {
           if (bin.contains(cachedNewManifest) && bin.size() < 
minManifestsCountToMerge) {
             // not enough to merge, add all manifest files to the output list
             outputManifests.addAll(bin);
+          } else if ((!Collections.disjoint(bin, appendManifests)) && 
bin.size() < minManifestsCountToMerge) {
 
 Review comment:
   thanks @rdblue for the explanation. Sorry to delay in reply !
   
   > The check above, bin.contains(cachedNewManifest) is intended to catch the 
last bin. Only the last bin is left unmerged, so that it can accumulate more 
manifests and isn't merged every time. But the bin before the last can be 
merged if it is full.
   
   it means if `bin.contains(cachedNewManifest)` is true, then this the last 
bin.
   Because latest added manifests/files has to be in the last bin?
   
   To handle the appendManifest case, we can maintain one more variables 
`cachedNewAppenedManifest`, which will be initialized by new `ManifestFile` 
supplied to `appendManifests(ManifestFile manifestFile)`
   
   condition could be:
   ```
   else if (bin.contains(cachedNewAppenedManifest) && bin.size() < 
minManifestsCountToMerge) {
               // not enough to merge, add all manifest files to the output list
               outputManifests.addAll(bin);
             }
   ```
   
   which means, if the `cachedNewAppenedManifest` (latest appendedManifest 
file) is present in the bin, then it would be the last one.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to