Re: [I] [SUPPORT] Queries are very memory intensive due to low read parallelism in HoodieMergeOnReadRDD [hudi]

via GitHub Sat, 22 Mar 2025 21:32:41 -0700


mzheng-plaid commented on issue #12434:
URL: https://github.com/apache/hudi/issues/12434#issuecomment-2746014546


   > [@mzheng-plaid](https://github.com/mzheng-plaid) So are you saying 
spark.read.format("parquet").load({s3_path}) is consuming much lesser resource 
than read optimized query on hudi table?
   > 
   > Can you share the spark event logs for the both to analyze it further?
   
   Apologies, missed your response, but yes it is consuming much less resources 
than the read optimized table. This seems like its because Hudi does not honor 
`spark.sql.files.maxPartitionBytes` (Iceberg added that support in 
https://github.com/apache/iceberg/pull/8922#issuecomment-1784547459)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SUPPORT] Queries are very memory intensive due to low read parallelism in HoodieMergeOnReadRDD [hudi]

Reply via email to