manishmalhotrawork commented on a change in pull request #524: respect 
commit.manifest.min.count
URL: https://github.com/apache/incubator-iceberg/pull/524#discussion_r346634538
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
 ##########
 @@ -588,19 +592,20 @@ private Evaluator 
extractInclusiveDeleteExpression(ManifestReader reader) {
             outputManifests.add(bin.get(0));
             return;
           }
-
           // if the bin has a new manifest (the new data files) then only 
merge it if the number of
           // manifests is above the minimum count. this is applied only to 
bins with an in-memory
           // manifest so that large manifests don't prevent merging older 
groups.
           if (bin.contains(cachedNewManifest) && bin.size() < 
minManifestsCountToMerge) {
             // not enough to merge, add all manifest files to the output list
             outputManifests.addAll(bin);
+          } else if (bin.contains(cachedNewAppenedManifest) && bin.size() < 
minManifestsCountToMerge) {
 
 Review comment:
   thanks a lot @rdblue for explanation !
   make sense to keep the new data files or manifests in the first/top bin, and 
not to merge first bin if it has the latest till it reaches threshold.
    
   let me change the logic to track `firstAppendedManifest` instead of 
`lastAppendedManifest`.
   
   One question on this statement: 
   >That way, systems like Presto that don't wait for job planning to complete 
before running a query will return results more quickly.
   
   I believe we are saying that as "Presto doesn't wait for job planning to 
complete before running a query, so will return results **(recently added 
data)** more quickly,  as we are keeping the recent/new manifest files at the 
top" ?
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to