[GitHub] [iceberg] kbendick commented on issue #4875: How to shorten the merge time of multi-partitioned tables

GitBox Thu, 26 May 2022 10:52:10 -0700


kbendick commented on issue #4875:
URL: https://github.com/apache/iceberg/issues/4875#issuecomment-1138851480

Lastly, all of that frequent committing that is generating small files would
likely be easier to manage if you _also_ ran the `RewriteManifests` action.
This will rewrite the manifests, packing them into larger files. This will
reduce the time spent on planning, as likely your metadata has also gotten
larger.

I'll leave you with the link to the Spark SQL stored procedure,
https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_manifests, but
I would similarly encourage you to check out the code for further options and
understanding of how it works given the scale of your table.

Lastly, I usually advocate for writing the data out as "correctly" as
possible the first time. In this case, I don't think that's _entirely_ possible
(and table maintenance is a fact of life with Iceberg), but if you reduce your
Flink jobs commit interval just slightly, you'll likely see _significantly_
faster performance speed up. You should also try setting the write
distrribution mode.

If your Flink job shuffles data to the same task manager for similar output,
you'll potentially wind up with fewer files.

I'd suggest starting from
https://iceberg.apache.org/docs/latest/configuration/#write-properties (looking
at `write.distribution.mode`), as well as looking for a larger summary in the
relevant file for that configuration value,
https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/DistributionMode.java.

The PR for adding `write.distribution.mode` support to Flink was added by
openinx in
https://github.com/apache/iceberg/commit/c75ac359c1de6bf9fd4894b40009c5c42d2fee9d,
which might also be of interest to you to see the Flink specific behavior and
any caveats. By shuffling the data for each partition to one task manager on
write (or a handful of them potentially), then you'll see fewer smaller files
with fewer writers for each partition.

But I think the _biggest_ thinig to help you, given that your rewrite tasks
are all taknig about a minute, would be to parallelize (as you tried), but via
`partial-progress.enabled` and `max-concurrent-file-group-rewrites`. This will
likely leave the job with the same CPU time in total, but will reduce the wall
clock run time of the job significantly.

Best of luck and please let me know if / when we can close this issue!

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on issue #4875: How to shorten the merge time of multi-partitioned tables

Reply via email to