ad1happy2go commented on issue #8302: URL: https://github.com/apache/hudi/issues/8302#issuecomment-1493595411
@andreagarcia20 This step will be also do the index lookup which is resulting in lots of spill to disk and ultimately failing the jobs. Can you try to use SIMPLE index and see if you are facing the same issue. Also is the key very randomly distributed across the data set. Also check if there is a skew in the dataset as lots of disk spilling is happening so lots of data going into few partitions. Check the amount of data being written by successful tasks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
