alexeykudinkin commented on issue #6174:
URL: https://github.com/apache/hudi/issues/6174#issuecomment-1244742505
@tarunguptanit would it be possible for you to try out Hudi 0.12?
To explain a little bit what you might be observing:
- First of all in your case you don't rely on partition-pruning, instead
you're _directly_ reading one of the partitions by providing tha sub-path w/in
the table. While it's somewhat similar to partition pruning behavior this
distinctly different mechanism in terms of implementation (inside both Spark
and Hudi)
- The reason why you see 1500 tasks being spin'd is b/c even though you're
reading one particular partition Hudi currently will be doing file-listing of
the whole table (file-listing means that we will just list the files in the
table, but we won't be reading the whole table). This is a known issue and
there's an effort underway to revisit that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]