[ 
https://issues.apache.org/jira/browse/OOZIE-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176167#comment-15176167
 ] 

Robert Kanter edited comment on OOZIE-2477 at 3/2/16 7:02 PM:
--------------------------------------------------------------

We've seen a number of problems with the Spark Action, largely due to classpath 
and other issues because we're running the SparkSubmit class in the Java 
Action.  I'm all for refactoring it to use a REST API.  Thanks for looking into 
this.

Here's some early feedback/questions based on the doc:
# We should ensure that yarn-client, yarn-cluster, and local modes still work 
(and don't require existing workflows to be updated or anything).  In other 
words, we need to make sure this change is backwards compatible.
# Who receives the REST API calls?  I'm more familiar with Yarn than Standalone 
and Mesos, but with used with Yarn, the only daemon I'm aware of is the Spark 
History Server.  
# I asked someone more familiar with Spark about this internal REST API, and 
while he hasn't been following it too closely, he was concerned that it wasn't 
really designed with external clients in mind.  Do we know how stable it is?  
If Oozie runs a lot of Spark Actions, will they hold up?
# Do we need to worry about the REST API changing between Spark versions?  From 
my brief look at the Spark JIRA, it sounds like part of the idea was to ensure 
compatibility between Spark versions, so it shouldn't change; but we should 
look into what guarantees are made here.
# What about jars from the user or sharelib?  How to we pass these along with 
the job with the REST API?
# How does security work with the REST API?  Can we pass delegation tokens?  
And what about HTTPS?


was (Author: rkanter):
We've seen a number of problems with the Spark Action, largely due to classpath 
and other issues because we're running the SparkSubmit class in the Java 
Action.  I'm all for refactoring it to use a REST API.  Thanks for looking into 
this.

Here's some early feedback/questions based on the doc:
# We should ensure that yarn-client, yarn-cluster, and local modes still work 
(and don't require existing workflows to be updated or anything).  In other 
words, we need to make sure this change is backwards compatible.
# Who receives the REST API calls?  I'm more familiar with Yarn than Standalone 
and Mesos, but with used with Yarn, the only daemon I'm aware of is the Spark 
History Server.  
# I asked someone more familiar with Spark about this internal REST API, and 
while he hasn't been following it too closely, he was concerned that it wasn't 
really designed with external clients in mind.  Do we know how stable it is?  
If Oozie runs a lot of Spark Actions, will they hold up?
# Do we need to worry about the REST API changing between Spark versions?  From 
my brief look at the Spark JIRA, it sounds like part of the idea was to ensure 
compatibility between Spark versions, so it shouldn't change; but we should 
look into what guarantees are made here.

> Oozie Spark Node to support Standalone and Mesos Deployment modes.
> ------------------------------------------------------------------
>
>                 Key: OOZIE-2477
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2477
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Ahmed Kamal
>              Labels: Spark
>
>  I'm interested in extending the current spark node to support them and 
> contributing this to the project. An initial design document is proposed here
> https://docs.google.com/document/d/12uf3B6VMgp_sI4sUiOwcmiMLTgLb5kL2weM7T0cgZNk/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to