[
https://issues.apache.org/jira/browse/FLINK-27708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556682#comment-17556682
]
Jane Chan commented on FLINK-27708:
-----------------------------------
* Why
** Checkpoints will interfere with the writer, which forces the writer close,
and thus the generated files may not meet the target file size.
!image-2022-06-21-14-59-59-593.png|width=502,height=149!
* How
** Since the append-only table does not define a key, the compaction should be
based on the sequence number to keep orderliness.
** We could introduce an asynchronized task to collect previously committed
files whose sizes are less than the target file size, sort files by min/max seq
number, and then perform a concatenation rewrite. And during the prepare commit
phase, the compacted files (if available) can be submitted along with the newly
written files.
Please assign this ticket to me, cc [~lzljs3620320], thanks!
> Add background compaction task for append-only table when ingesting.
> --------------------------------------------------------------------
>
> Key: FLINK-27708
> URL: https://issues.apache.org/jira/browse/FLINK-27708
> Project: Flink
> Issue Type: Sub-task
> Reporter: Zheng Hu
> Priority: Major
> Labels: pull-request-available
> Fix For: table-store-0.2.0
>
> Attachments: image-2022-06-21-14-59-59-593.png
>
>
> We could still execute compaction task to merge small files in the background
> for append-only table.
> This compaction is just to avoid a lot of small files.
> Its purpose is similar to that of filesystem compaction:
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem/#file-compaction
--
This message was sent by Atlassian Jira
(v8.20.7#820007)