sjwiesman commented on a change in pull request #13990:
URL: https://github.com/apache/flink/pull/13990#discussion_r520212493
##########
File path: docs/dev/table/connectors/filesystem.md
##########
@@ -150,6 +150,41 @@ become finished on the next checkpoint) control the size
and number of these par
**NOTE:** For row formats (csv, json), you can set the parameter
`sink.rolling-policy.file-size` or `sink.rolling-policy.rollover-interval` in
the connector properties and parameter `execution.checkpointing.interval` in
flink-conf.yaml together
if you don't want to wait a long period before observe the data exists in file
system. For other formats (avro, orc), you can just set parameter
`execution.checkpointing.interval` in flink-conf.yaml.
+### File Compaction
+
+If you want a smaller checkpoint interval and do not want to generate a large
number of small files,
+it is recommended that you open file compaction:
+
+<table class="table table-bordered">
+ <thead>
+ <tr>
+ <th class="text-left" style="width: 20%">Key</th>
+ <th class="text-left" style="width: 15%">Default</th>
+ <th class="text-left" style="width: 10%">Type</th>
+ <th class="text-left" style="width: 55%">Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td><h5>auto-compaction</h5></td>
+ <td style="word-wrap: break-word;">false</td>
+ <td>Boolean</td>
+ <td>Whether to enable automatic compaction in streaming sink or not.
The data will be written to temporary files. After the checkpoint is completed,
the temporary files generated by a checkpoint will be compacted. The temporary
files are invisible before compaction.</td>
+ </tr>
+ <tr>
+ <td><h5>compaction.file-size</h5></td>
+ <td style="word-wrap: break-word;">(none)</td>
+ <td>MemorySize</td>
+ <td>The compaction target file size, the default value is the rolling
file size.</td>
+ </tr>
+ </tbody>
+</table>
+
+After you open file compaction, small files that are not large enough will be
merged into large files,
+It is worth noting that:
+- Only files in a single checkpoint are compacted, that is, at least the same
number of files as the number of checkpoints is generated.
+- The file before merging is invisible, so the visibility of the file may be:
checkpoint interval + compaction time.
Review comment:
```suggestion
If enabled, file compaction will merge multiple small files into larger
files based on the target file size.
When running file compaction in production, please be aware that:
- Only files in a single checkpoint are compacted, that is, at least the
same number of files as the number of checkpoints is generated.
- The file before merging is invisible, so the visibility of the file may
be: checkpoint interval + compaction time.
```
##########
File path: docs/dev/table/connectors/filesystem.md
##########
@@ -150,6 +150,41 @@ become finished on the next checkpoint) control the size
and number of these par
**NOTE:** For row formats (csv, json), you can set the parameter
`sink.rolling-policy.file-size` or `sink.rolling-policy.rollover-interval` in
the connector properties and parameter `execution.checkpointing.interval` in
flink-conf.yaml together
if you don't want to wait a long period before observe the data exists in file
system. For other formats (avro, orc), you can just set parameter
`execution.checkpointing.interval` in flink-conf.yaml.
+### File Compaction
+
+If you want a smaller checkpoint interval and do not want to generate a large
number of small files,
+it is recommended that you open file compaction:
Review comment:
```suggestion
The file sink supports file compactions, which allows applications to have
smaller checkpoint intervals without generating a large number of files.
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]