[GitHub] [hudi] alexeykudinkin commented on issue #6174: Hudi Read Performance : Partition pruning not happening when reading Hudi table

GitBox Mon, 12 Sep 2022 17:29:42 -0700


alexeykudinkin commented on issue #6174:
URL: https://github.com/apache/hudi/issues/6174#issuecomment-1244742505


   @tarunguptanit would it be possible for you to try out Hudi 0.12? 
   
   To explain a little bit what you might be observing: 
     - First of all in your case you don't rely on partition-pruning, instead 
you're _directly_ reading one of the partitions by providing tha sub-path w/in 
the table. While it's somewhat similar to partition pruning behavior this 
distinctly different mechanism in terms of implementation (inside both Spark 
and Hudi)
     - The reason why you see 1500 tasks being spin'd is b/c even though you're 
reading one particular partition Hudi currently will be doing file-listing of 
the whole table (file-listing means that we will just list the files in the 
table, but we won't be reading the whole table). This is a known issue and 
there's an effort underway to revisit that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] alexeykudinkin commented on issue #6174: Hudi Read Performance : Partition pruning not happening when reading Hudi table

Reply via email to