psendyk commented on issue #8890: URL: https://github.com/apache/hudi/issues/8890#issuecomment-1654659545
@ad1happy2go Our initial upgrade attempt only failed for one out of four of our tables; the other three have much lower incoming data volume so perhaps it's related to that. I just trie reproducing the error on another fresh table with less data -- I ingested a single micro-batch (which also created the table) using 0.12.1, and then continued the ingestion with 0.13.0. This time the 0.13.0 job continued to make progress for a couple of micro-batches until I killed it; it didn't run into that issue. Also, the exception only happens for some partitions in the micro-batch while others are written successfully. Perhaps it can be related to partition cardinality/file size distribution across partitions; each of the micro-batches in our job writes to ~12-15k partitions and the number of records per partition varies quite significantly, probably from a few records min to ~10,000s max. I haven't verified this but given that the issue seems to be "missing small files," I suspect this error might only happen to the partitions with less data/more small files. Perhaps you can attempt to reproduce it by modifying the partitioning schema in your snippet -- not sure how much data you're ingesting but perhaps file sizing is more uniform when only partitioning on `year`. Let me know if I can provide any other info that'd help with reproducing. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
