[
https://issues.apache.org/jira/browse/FLINK-30695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
luoyuxia updated FLINK-30695:
-----------------------------
Parent: (was: FLINK-29635)
Issue Type: Improvement (was: Sub-task)
> Support to set parallelism for compact operator according to the number of
> files in AQE.
> ----------------------------------------------------------------------------------------
>
> Key: FLINK-30695
> URL: https://issues.apache.org/jira/browse/FLINK-30695
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / Hive
> Reporter: luoyuxia
> Priority: Major
>
> In current design for compact files in Hive sink, there's a coordinator
> operator that collects all files written and decide which files should be
> merge to a file. It will pack the infomation to a CompactUnit which contains
> the files path that should be merge to a file.
> Then, the coordinator operator will pass CompactUnit to downstream compact
> operator to do actual compaction.
> The volume for the data emitted by the coordinator is small for it only
> send control messages, which will cause the parallelism of the compact
> operator small in aqe. But actually, most of work(reading files and write a
> new file) is done by the compact operator . If the parallelism of compact
> operator is small, it must cost much time to compact.
> Ideally, the parallelism of the compact operator should be equal to the
> number of the finnal merged files which can be decided by the the
> coordinator operator. I think the aqe framework can provide some mechanism to
> make the operator decide the parallelism.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)