umehrot2 commented on pull request #1768: URL: https://github.com/apache/hudi/pull/1768#issuecomment-666767040
> LGTM overall. > > One high level question though. this parallelizes till the first level only right? so we are assuming this helps the common cases like date based tables with multiple years of data? I mean - if you only have a few years of data <10 say, and `yyyy` is the top level partitioning field, would this parallelization still help? @vinothchandar you are right about this. It will parallelize only on the top level partition folder. I think this will still help with parallelization, and would work best where there is only one level of partitioning. But I agree there is scope to further improve this by getting leaf level partition directories instead to help with multi level partitioning scenario. Is it okay if I open a JIRA for this and pursue it separately ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
