[GitHub] [arrow-datafusion] alamb commented on pull request #5057: Parquet parallel scan

via GitHub Sat, 28 Jan 2023 08:19:01 -0800


alamb commented on PR #5057:
URL: 
https://github.com/apache/arrow-datafusion/pull/5057#issuecomment-1407431513


   > As it works for now - yes, use case is mostly "relatively large files less 
than number of target_partitions" -- I guess it could be improved / reworked 
later to something like "perform repartitioning even for target_partitions in 
case there is significant skew in current partitioning"
   
   I think to make this really happen we will need to have more runtime 
dynamics (aka using a morsel driven sceheduler)
   
   > I don't mind enabling parallelism by default and it seems to be the 
fastest way to deliver this feature, but (I'm not sure, just a suggestion) 
maybe better time for this will be in 1 (or 2) releases after the setting 
itself will be released?
   
   I agree -- let's get this PR merged in (default to off) and then plan to 
enable it by default in a few weeks (we just need to remember to do so!)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on pull request #5057: Parquet parallel scan

Reply via email to