noahtaite commented on issue #10239: URL: https://github.com/apache/hudi/issues/10239#issuecomment-1841683174
@ad1happy2go Thank you for the response. I confirmed my readers **were not** setting this option on read, thinking it was enabled by default. After enabling, the large gap has significantly reduced. For example this is a very large application that queries 5 of my largest Hudi tables. The gap for this application was 3 hours before, reduced to 20 minutes now: <img width="1709" alt="image" src="https://github.com/apache/hudi/assets/24283126/e66aec07-5b75-4c04-acb3-e740d66fb021"> We have actually seen a fairly significant performance change by enabling this. Seems to be mostly for the better - I had a couple sessions on shared clusters start to hang when they originally weren't, but dedicated clusters are quicker to load using metadata on large tables. Only suggestion would be to make it explicit in the configurations page that this is not enabled on reader side by default. However it is documented on the "Metadata Indexing" page. Thanks again Aditya. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
