dataproblems commented on issue #12116:
URL: https://github.com/apache/hudi/issues/12116#issuecomment-2435899618

   @ad1happy2go, I have about 6 partitions for the sample dataset that I'm 
using. 
   
   ```
   +-----------+-----------+
   |PartitionCol     |Number of Unique Values|
   +-----------+-----------+
   |One    |12959311   |
   |Two       |629845160  |
   |Three      |458227144  |
   |Four         |1107519580 |
   |Five     |472111     |
   |Six|19391133   |
   +-----------+-----------+
   ```
   
   Let me update you on how the repartition exercise goes and see if it results 
in a smaller size for the `.commit` file. Our main problem is that we're not 
able to use `POPULATE_META_FIELDS` to `true` and create the index. The job 
fails after writing partial data to s3 due to executor heartbeats / OOM issues. 
Do you think GC could be a culprit there? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to