Re: [PR] [flink] Small changelog files can now be compacted into big files [paimon]

via GitHub Wed, 25 Sep 2024 18:45:40 -0700


JingsongLi commented on code in PR #4255:
URL: https://github.com/apache/paimon/pull/4255#discussion_r1776187272



##########
docs/content/maintenance/write-performance.md:
##########
@@ -160,3 +160,16 @@ You can use fine-grained-resource-management of Flink to 
increase committer heap
 1. Configure Flink Configuration 
`cluster.fine-grained-resource-management.enabled: true`. (This is default 
after Flink 1.18)
 2. Configure Paimon Table Options: `sink.committer-memory`, for example 300 
MB, depends on your `TaskManager`.
    (`sink.committer-cpu` is also supported)
+
+## Changelog Compaction
+
+If Flink's checkpoint interval is short (for example, 30 seconds) and the 
number of buckets is large,
+each snapshot may produce lots of small changelog files.
+Too many files may put a burden on the distributed storage cluster.
+
+In order to compact small changelog files into large ones, you can set the 
table option `changelog.compact.parallelism`.
+This option will add a compact operator after the writer operator, which 
copies changelog files into large ones.
+If the parallelism becomes larger, file copying will become faster.
+However, the number of resulting files will also become larger.
+As file copying is fast in most storage system,
+we suggest that you start experimenting with `'changelog.compact.parallelism' 
= '1'` and increase the value if needed.

Review Comment:
   My idea is to have only one switch: `changelog.precommit-compact` = `true`.
   
   We can add a Coordinator node to this pipeline to decide how to concatenate 
it into a target file size result file, which can be one or multiple files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [flink] Small changelog files can now be compacted into big files [paimon]

Reply via email to