[
https://issues.apache.org/jira/browse/OOZIE-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176167#comment-15176167
]
Robert Kanter edited comment on OOZIE-2477 at 3/2/16 7:02 PM:
--------------------------------------------------------------
We've seen a number of problems with the Spark Action, largely due to classpath
and other issues because we're running the SparkSubmit class in the Java
Action. I'm all for refactoring it to use a REST API. Thanks for looking into
this.
Here's some early feedback/questions based on the doc:
# We should ensure that yarn-client, yarn-cluster, and local modes still work
(and don't require existing workflows to be updated or anything). In other
words, we need to make sure this change is backwards compatible.
# Who receives the REST API calls? I'm more familiar with Yarn than Standalone
and Mesos, but with used with Yarn, the only daemon I'm aware of is the Spark
History Server.
# I asked someone more familiar with Spark about this internal REST API, and
while he hasn't been following it too closely, he was concerned that it wasn't
really designed with external clients in mind. Do we know how stable it is?
If Oozie runs a lot of Spark Actions, will they hold up?
# Do we need to worry about the REST API changing between Spark versions? From
my brief look at the Spark JIRA, it sounds like part of the idea was to ensure
compatibility between Spark versions, so it shouldn't change; but we should
look into what guarantees are made here.
# What about jars from the user or sharelib? How to we pass these along with
the job with the REST API?
# How does security work with the REST API? Can we pass delegation tokens?
And what about HTTPS?
was (Author: rkanter):
We've seen a number of problems with the Spark Action, largely due to classpath
and other issues because we're running the SparkSubmit class in the Java
Action. I'm all for refactoring it to use a REST API. Thanks for looking into
this.
Here's some early feedback/questions based on the doc:
# We should ensure that yarn-client, yarn-cluster, and local modes still work
(and don't require existing workflows to be updated or anything). In other
words, we need to make sure this change is backwards compatible.
# Who receives the REST API calls? I'm more familiar with Yarn than Standalone
and Mesos, but with used with Yarn, the only daemon I'm aware of is the Spark
History Server.
# I asked someone more familiar with Spark about this internal REST API, and
while he hasn't been following it too closely, he was concerned that it wasn't
really designed with external clients in mind. Do we know how stable it is?
If Oozie runs a lot of Spark Actions, will they hold up?
# Do we need to worry about the REST API changing between Spark versions? From
my brief look at the Spark JIRA, it sounds like part of the idea was to ensure
compatibility between Spark versions, so it shouldn't change; but we should
look into what guarantees are made here.
> Oozie Spark Node to support Standalone and Mesos Deployment modes.
> ------------------------------------------------------------------
>
> Key: OOZIE-2477
> URL: https://issues.apache.org/jira/browse/OOZIE-2477
> Project: Oozie
> Issue Type: Improvement
> Reporter: Ahmed Kamal
> Labels: Spark
>
> I'm interested in extending the current spark node to support them and
> contributing this to the project. An initial design document is proposed here
> https://docs.google.com/document/d/12uf3B6VMgp_sI4sUiOwcmiMLTgLb5kL2weM7T0cgZNk/edit?usp=sharing
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)