beyond1920 commented on issue #9591: URL: https://github.com/apache/hudi/issues/9591#issuecomment-1713285421
Hi @yihua @KnightChess I think there are two bugs here. 1. The first bug is table service client does not properly release the cached RDDs. It would leads to the resources could not release in time after the job finished. It's the problem @yihua @KnightChess point out. 2. The second problem is it's not necessary to persist this write status RDD because the RDD would not reused again in the compaction job. Currently, the write status would persist anyway, it would leads to the cached RDD not to be unpersisted once all tasks on the executor is done. If there are slow tasks running in current stage, the other executors resources would not be released even if their tasks in this stage have already finished quickly. The 70% of resource reduction comes from this point. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
