Alibaba-HZY opened a new issue, #1260: URL: https://github.com/apache/incubator-paimon/issues/1260
### Search before asking - [X] I searched in the [issues](https://github.com/apache/incubator-paimon/issues) and found nothing similar. ### Motivation Now that hive wtriter is atomic, it only makes a commit once after the job is completed. When users use hive writer to migrate historical data from hive partition table to paimon partition table, many small files may be generated. The number of small files is approximately equal to the number of map tasks. Because a map task may contain data of all partitions, the data size of a map task is about 128 MB. After being allocated to all partitions, the data files of each partition are relatively small. This will lead to a query for job oom such as #1253 ### Solution It is better to do non-atomic writes, commit each map task once, so that multiple map task commits trigger compact and reduce the number of small files.So we can add a parameter like **hive-write-atomic**. The default value is false, the user can specify true if atomic writing is required ### Anything else? _No response_ ### Are you willing to submit a PR? - [X] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
