Alibaba-HZY opened a new issue, #1260:
URL: https://github.com/apache/incubator-paimon/issues/1260

   ### Search before asking
   
   - [X] I searched in the 
[issues](https://github.com/apache/incubator-paimon/issues) and found nothing 
similar.
   
   
   ### Motivation
   
   Now that hive wtriter is atomic, it only makes a commit once after the job 
is completed. When users use hive writer to migrate historical data from hive 
partition table to paimon partition table, many small files may be generated. 
The number of small files is approximately equal to the number of map tasks. 
Because a map task may contain data of all partitions, the data size of a map 
task is about 128 MB. After being allocated to all partitions, the data files 
of each partition are relatively small. This will lead to a query for job oom 
such as #1253 
    
   
   ### Solution
   
   It is better to do non-atomic writes, commit each map task once, so that 
multiple map task commits trigger compact and reduce the number of small 
files.So we can add a parameter like **hive-write-atomic**. The default value 
is false, the user can specify true if atomic writing is required
    
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to