szehon-ho opened a new issue #3211: URL: https://github.com/apache/iceberg/issues/3211
Some tables may end up in a bad state with tens of thousands of ManifestFiles. RewriteManifests, even with a very restrictive predicate on the Manifests, ends up not being able to finish in a reasonable time. (Imagine the table is the target of a Flink ingest job, for example). This is because all Manifests have to be read in order to evaluate the Predicate. So, propose to add a rewriteIf(Expression filter) on the RewriteManifest API as an alternate to rewriteIf(Predicate<ManifestFIle>). This is then able to push down into the ManifestGroup and reduce the ManifestFiles to read, to be compacted. Also, it is more user-friendly. Trying to filter ManifestFile with the predicate using the ManifestFile.PartitionFieldSummary() is not user-friendly at all as it's upper and lower bounds are in byte format. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
