manishmalhotrawork commented on a change in pull request #524: respect
commit.manifest.min.count
URL: https://github.com/apache/incubator-iceberg/pull/524#discussion_r346634538
##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -588,19 +592,20 @@ private Evaluator
extractInclusiveDeleteExpression(ManifestReader reader) {
outputManifests.add(bin.get(0));
return;
}
-
// if the bin has a new manifest (the new data files) then only
merge it if the number of
// manifests is above the minimum count. this is applied only to
bins with an in-memory
// manifest so that large manifests don't prevent merging older
groups.
if (bin.contains(cachedNewManifest) && bin.size() <
minManifestsCountToMerge) {
// not enough to merge, add all manifest files to the output list
outputManifests.addAll(bin);
+ } else if (bin.contains(cachedNewAppenedManifest) && bin.size() <
minManifestsCountToMerge) {
Review comment:
thanks a lot @rdblue for explanation !
make sense to keep the new data files or manifests in the first/top bin, and
not to merge first bin if it has the latest till it reaches threshold.
let me change the logic to track `firstAppendedManifest` instead of
`lastAppendedManifest`.
One question on this statement:
>That way, systems like Presto that don't wait for job planning to complete
before running a query will return results more quickly.
I believe we are saying that as "Presto doesn't wait for job planning to
complete before running a query, so will return results **(recently added
data)** more quickly, as we are keeping the recent/new manifest files at the
top" ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]