[GitHub] [arrow-datafusion] tustvold commented on issue #4295: Change representation of partition in FileScanConfig

GitBox Fri, 25 Nov 2022 01:25:33 -0800


tustvold commented on issue #4295:
URL: 
https://github.com/apache/arrow-datafusion/issues/4295#issuecomment-1327202615


   > Anyway the concept of partition seems to sit pretty deep in codebase, I 
saw that It is passed through hierarchy of ExecutionPlan's execute(...).
   
   The scheduler I started work on preserved the concept of partitions, but did 
not rely on them for work distribution, or at least wouldn't have if I had 
actually finished it :sweat_smile: 
   
   > Any changes in regards to existing pull model
   
   Yes, the hope was to gradually change to a push model for operators where it 
is possible
   
   > Will scheduler contain a DAG that would replace hierarchy based on 
children() from ExecutionPlan
   
   See 
https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/scheduler/pipeline/mod.rs#L27
   
   > I wonder how fairness of sharing resources would be approached, because 
from what I have heard HyperDB processes single query at the time, that 
achieves ideal fairness with morsels
   
   IMO fairness is better handled at a higher level, e.g. with separate query 
pools or even separate query processes. The scheduler should focus on 
throughput at the expense of fairness, if nothing else fairly multiplexing 
queries is a recipe to blow your memory budget.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] tustvold commented on issue #4295: Change representation of partition in FileScanConfig

Reply via email to