[
https://issues.apache.org/jira/browse/SPARK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237056#comment-15237056
]
Justin Uang commented on SPARK-9850:
------------------------------------
I like this idea a lot. One thing we encounter in our use cases is that we end
up accidentally joining on a field that is 50% nulls, or a string that
represents null like "N/A". It then becomes quite cumbersome to have to
constantly have to have a Spark expert dig in and find why there is 1 task that
will never finish. Would it be possible to add a threshold such that if a join
key ever gets too big, it will just fail the job with an error message?
> Adaptive execution in Spark
> ---------------------------
>
> Key: SPARK-9850
> URL: https://issues.apache.org/jira/browse/SPARK-9850
> Project: Spark
> Issue Type: Epic
> Components: Spark Core, SQL
> Reporter: Matei Zaharia
> Assignee: Yin Huai
> Attachments: AdaptiveExecutionInSpark.pdf
>
>
> Query planning is one of the main factors in high performance, but the
> current Spark engine requires the execution DAG for a job to be set in
> advance. Even with costÂ-based optimization, it is hard to know the behavior
> of data and user-defined functions well enough to always get great execution
> plans. This JIRA proposes to add adaptive query execution, so that the engine
> can change the plan for each query as it sees what data earlier stages
> produced.
> We propose adding this to Spark SQL / DataFrames first, using a new API in
> the Spark engine that lets libraries run DAGs adaptively. In future JIRAs,
> the functionality could be extended to other libraries or the RDD API, but
> that is more difficult than adding it in SQL.
> I've attached a design doc by Yin Huai and myself explaining how it would
> work in more detail.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]