nsivabalan commented on issue #6811: URL: https://github.com/apache/hudi/issues/6811#issuecomment-1305153782
Few pointers: - for failed executors, do check if they are failing due to OOMs. if yes, you may need to tune your spark memory configs. - I see you have set cleaner.commits.retained = 1. So, this would clean the data files aggressively which could add to your write latency. you can consider relaxing this a bit. Also, you can relax https://hudi.apache.org/docs/configurations/#hoodiecleanmaxcommits if you plan to relax the cleaner commits retained. - If you have lot of small files in your lake, it might impact your index look up time. May be you can enable clustering to batch lot of small files into larger ones. you can choose to do this async as well. - If none of the above suggestion works, one another option is to consider BUCKET_INDEX. but that would mean you have to start w/ a new table altogether. but your index look up is expected to be O(1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
