Matei Zaharia created SPARK-9850:
------------------------------------
Summary: Adaptive execution in Spark
Key: SPARK-9850
URL: https://issues.apache.org/jira/browse/SPARK-9850
Project: Spark
Issue Type: New Feature
Components: Spark Core, SQL
Reporter: Matei Zaharia
Query planning is one of the main factors in high performance, but the current
Spark engine requires the execution DAG for a job to be set in advance. Even
with costÂ-based optimization, it is hard to know the behavior of data and
user-defined functions well enough to always get great execution plans. This
JIRA proposes to add adaptive query execution, so that the engine can change
the plan for each query as it sees what data earlier stages produced.
We propose adding this to Spark SQL / DataFrames first, using a new API in the
Spark engine that lets libraries run DAGs adaptively. In future JIRAs, the
functionality could be extended to other libraries or the RDD API, but that is
more difficult than adding it in SQL.
I've attached a design doc by Yin Huai and myself explaining how it would work
in more detail.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]