jon-chuang edited a comment on issue #1221:
URL: 
https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782


   @yjshen thanks for your questions
   
   > task scheduling, keepalive monitoring, struggler detection, and 
speculative task execution\
   
   - yes. 
   - yes and failure recovery at worker and node level. We also have a worker 
monitoring dashboard with basic resource utilization info.
   - we do not have robust tracing tools yet, but it is planned. As for 
scheduling, it does not currently take into account global information like 
straggling in an execution DAG and try to prioritize bottlenecked tasks. 
However, we are looking into priority mechanism for tasks, which a user (or 
external monitoring tool) could prioritize for bottlenecked tasks.
   - not yet, but we have plans to preempt workers in event of OOM. Note that 
Ray will always try to schedule tasks if there are resources available. So if 
the dataframe/SQL operation does not have an all-to-all dependency, it will 
automatically proceed to the next stage.
   
   > Therefore I could easily build a distributed SQL engine on top of 
DataFusion with little effort?
   
   This is unclear to me, and requires more investigation. However, note that 
distributed dataframe project, Modin was built on top of Ray.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to