umehrot2 commented on pull request #1768:
URL: https://github.com/apache/hudi/pull/1768#issuecomment-666767040


   > LGTM overall.
   > 
   > One high level question though. this parallelizes till the first level 
only right? so we are assuming this helps the common cases like date based 
tables with multiple years of data? I mean - if you only have a few years of 
data <10 say, and `yyyy` is the top level partitioning field, would this 
parallelization still help?
   
   @vinothchandar you are right about this. It will parallelize only on the top 
level partition folder. I think this will still help with parallelization, and 
would work best where there is only one level of partitioning. But I agree 
there is scope to further improve this by getting leaf level partition 
directories instead to help with multi level partitioning scenario. Is it okay 
if I open a JIRA for this and pursue it separately ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to