[jira] [Commented] (SPARK-3561) Allow for pluggable execution contexts in Spark

Oleg Zhurakousky (JIRA) Mon, 05 Jan 2015 13:12:55 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265113#comment-14265113
 ]


Oleg Zhurakousky commented on SPARK-3561:
-----------------------------------------

Sorry for the delay in response, I'll just blame the holidays ;)
No, I have not had a chance to run the elasticity tests against 1.2, so I am 
gonna have to follow up on that.

The main motivation for this proposal is to _formalize an extension model 
around Spark’s execution environment_ to allow other execution environments 
(new and existing) to be easily plugged-in by a system integrator without 
requiring a new release of Spark (giving current integration mechanism which 
relies on ‘case’ statement with hard-coded values).
Reasons for _why this is necessary?_ are many, but could all be summarized 
around an old **_generalization_** vs. **_specialization_** argument. And while 
_Tez, elastic scaling, utilization of cluster resources_ are all good examples 
and indeed were the initial motivators, they are certainly not the end and 
current efforts of several clients of ours who are integrating Spark with their 
custom execution environments using the proposed approach is a good evidence of 
its viability and an obvious benefit to Spark’s technology, allowing it to 
become a developer friendly “face” of many execution environments/technologies 
while continuing innovation of its own.

So I think the next logical step would be to gather “for” and “against” 
arguments around "pluggable execution context for Spark” in general, then we 
can discuss implementation. 

> Allow for pluggable execution contexts in Spark
> -----------------------------------------------
>
>                 Key: SPARK-3561
>                 URL: https://issues.apache.org/jira/browse/SPARK-3561
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Oleg Zhurakousky
>              Labels: features
>         Attachments: SPARK-3561.pdf
>
>
> Currently Spark provides integration with external resource-managers such as 
> Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the 
> current architecture of Spark-on-YARN can be enhanced to provide 
> significantly better utilization of cluster resources for large scale, batch 
> and/or ETL applications when run alongside other applications (Spark and 
> others) and services in YARN. 
> Proposal: 
> The proposed approach would introduce a pluggable JobExecutionContext (trait) 
> - a gateway and a delegate to Hadoop execution environment - as a non-public 
> api (@Experimental) not exposed to end users of Spark. 
> The trait will define 6 operations: 
> * hadoopFile 
> * newAPIHadoopFile 
> * broadcast 
> * runJob 
> * persist
> * unpersist
> Each method directly maps to the corresponding methods in current version of 
> SparkContext. JobExecutionContext implementation will be accessed by 
> SparkContext via master URL as 
> "execution-context:foo.bar.MyJobExecutionContext" with default implementation 
> containing the existing code from SparkContext, thus allowing current 
> (corresponding) methods of SparkContext to delegate to such implementation. 
> An integrator will now have an option to provide custom implementation of 
> DefaultExecutionContext by either implementing it from scratch or extending 
> form DefaultExecutionContext. 
> Please see the attached design doc for more details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-3561) Allow for pluggable execution contexts in Spark

Reply via email to