dataproblems commented on issue #12116: URL: https://github.com/apache/hudi/issues/12116#issuecomment-2435899618
@ad1happy2go, I have about 6 partitions for the sample dataset that I'm using. ``` +-----------+-----------+ |PartitionCol |Number of Unique Values| +-----------+-----------+ |One |12959311 | |Two |629845160 | |Three |458227144 | |Four |1107519580 | |Five |472111 | |Six|19391133 | +-----------+-----------+ ``` Let me update you on how the repartition exercise goes and see if it results in a smaller size for the `.commit` file. Our main problem is that we're not able to use `POPULATE_META_FIELDS` to `true` and create the index. The job fails after writing partial data to s3 due to executor heartbeats / OOM issues. Do you think GC could be a culprit there? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
