[GitHub] [arrow-datafusion] alamb commented on pull request #972: Set target_partitions on table scan in physical planner

GitBox Mon, 13 Sep 2021 10:22:11 -0700


alamb commented on pull request #972:
URL: https://github.com/apache/arrow-datafusion/pull/972#issuecomment-918407791



   @rdettai 
   
   > More and more distributed engines (especially in the cloud) are able to 
provision resources on demand according to the request. In that case, the total 
number of cores is not predefined. We don't want the TableProvider to try to 
give a defined number of partitions, but instead we want it to have its own 
partitioning strategy according to the data it has...
   
   I think of `target_partitions` as "target concurrency" (aka how many cores 
could you possibly keep busy at any time) and I think like you believe that the 
decisions of how to actually split up data across the partitions exposed to 
DataFusion should be an implementation detail of the `TableProvider`.
   
   So a system that uses DataFusion would set `target_partitions` based on how 
many CPU resources it wanted DataFusion to try and consume. Various 
TableProvider implementations could take that target into advisement as they 
divided up their data.
   
   > My conclusion would be the following:
   > 
   > keep the partitioning configurations at the TableProvider level (you 
define them when you instantiate the datasource), to allow different 
implementations to have different options.
   
   I think I would phrase it differently as "keep the choice of distribution of 
data across partitions at the `TableProvider` level"
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on pull request #972: Set target_partitions on table scan in physical planner

Reply via email to