[GitHub] [arrow-datafusion] mingmwang commented on issue #1848: settings in ExecuteQueryParams is omitted by the Ballista's scheduler.execute_query(), cause wrong partition count

GitBox Thu, 17 Feb 2022 19:01:15 -0800


mingmwang commented on issue #1848:
URL: 
https://github.com/apache/arrow-datafusion/issues/1848#issuecomment-1043794840



   > Will do. I think there are a couple of different ways we can approach this:
   > 
   > 1. Have the client specify a namespace in the request and use a 
`ExecutionContext`-per-namespace on the scheduler. We could then dynamically 
create new contexts whenever a new namespace comes in.
   > 2. Have the scheduler dynamically set target partitions based on executor 
statistics (e.g. number of available task slots). This would I think require a 
way to set the target partitions explicitly when creating a sql plan. So maybe 
add a new method to `ExecutionContext` like
   > 
   > `pub async fn sql(&mut self, sql: &str, target_partitions: usize) -> 
Result<Arc<dyn DataFrame>>`
   > 
   > Or both. 1 may be necessary anyway to support multi-tenancy but we may 
still, within a single namespace, want to allow specifying shuffle settings on 
a per-query basis.
   
   I would prefer to let the users choose the target partition at the current 
phase. Target partition should not be changed too dynamically, otherwise the 
runtime distributed physical plan will not be stable and could introduce 
additional shuffle exchanges. In future we might add some kind of adaptive 
methods to adjust the target partition size based on input/output data volume. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] mingmwang commented on issue #1848: settings in ExecuteQueryParams is omitted by the Ballista's scheduler.execute_query(), cause wrong partition count

Reply via email to