[
https://issues.apache.org/jira/browse/AIRFLOW-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Albertus Kelvin updated AIRFLOW-6249:
-------------------------------------
Description:
I think the driver status tracker should also be enabled for yarn, mesos &
kubernetes with cluster deploy mode.
Standalone, mesos & k8s cluster deploy mode: According to
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L543],
using *--status* parameter is supported for these cluster managers.
YARN cluster deploy mode: I think we can use built-in commands from yarn
itself, such as *yarn application -status <ApplicationID>*.
Therefore, the *_build_track_driver_status_command* method should be updated
accordingly to accommodate such a need, such as the following.
{code:python}
def _build_track_driver_status_command(self):
# The driver id so we can poll for its status
if not self._driver_id:
raise AirflowException(
"Invalid status: attempted to poll driver " +
"status but no driver id is known. Giving up.")
schemes = ("spark://", "mesos://", "k8s://https://")
if self._connection['master'].startswith(schemes):
# standalone, mesos, kubernetes
connection_cmd = self._get_spark_binary_path()
connection_cmd += ["--master", self._connection['master']]
connection_cmd += ["--status", self._driver_id]
else:
# yarn
connection_cmd = ["yarn application -status"]
connection_cmd += [self._driver_id]
self.log.debug("Poll driver status cmd: %s", connection_cmd)
return connection_cmd{code}
was:
I think the driver status tracker should also be enabled for yarn, mesos &
kubernetes with cluster deploy mode.
Standalone, mesos & k8s cluster deploy mode: According to
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L543],
using *--status* parameter is supported for these cluster managers.
YARN cluster deploy mode: I think we can use built-in commands from yarn
itself, such as *yarn application -status <ApplicationID>*.
Therefore, the *_build_track_driver_status_command* method should be updated
accordingly to accommodate such a need, such as the following.
{code:python}
def _build_track_driver_status_command(self):
# The driver id so we can poll for its status
if not self._driver_id:
raise AirflowException(
"Invalid status: attempted to poll driver " +
"status but no driver id is known. Giving up.")
schemes = ("spark://", "mesos://", "k8s://https://")
if self._connection['master'].startswith(schemes):
# standalone, mesos, kubernetes
connection_cmd = self._get_spark_binary_path()
connection_cmd += ["--master", self._connection['master']]
connection_cmd += ["--status", self._driver_id]
else:
# yarn
connection_cmd = ["yarn application -status"]
connection_cmd += [self._driver_id]
self.log.debug("Poll driver status cmd: %s", connection_cmd)
return connection_cmd{code}
> Spark driver status polling for standalone, YARN, Mesos and K8s
> ---------------------------------------------------------------
>
> Key: AIRFLOW-6249
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6249
> Project: Apache Airflow
> Issue Type: Improvement
> Components: hooks, operators
> Affects Versions: 1.10.6
> Reporter: Albertus Kelvin
> Priority: Minor
>
> I think the driver status tracker should also be enabled for yarn, mesos &
> kubernetes with cluster deploy mode.
> Standalone, mesos & k8s cluster deploy mode: According to
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L543],
> using *--status* parameter is supported for these cluster managers.
> YARN cluster deploy mode: I think we can use built-in commands from yarn
> itself, such as *yarn application -status <ApplicationID>*.
> Therefore, the *_build_track_driver_status_command* method should be updated
> accordingly to accommodate such a need, such as the following.
> {code:python}
> def _build_track_driver_status_command(self):
> # The driver id so we can poll for its status
> if not self._driver_id:
> raise AirflowException(
> "Invalid status: attempted to poll driver " +
> "status but no driver id is known. Giving up.")
> schemes = ("spark://", "mesos://", "k8s://https://")
> if self._connection['master'].startswith(schemes):
> # standalone, mesos, kubernetes
> connection_cmd = self._get_spark_binary_path()
> connection_cmd += ["--master", self._connection['master']]
> connection_cmd += ["--status", self._driver_id]
> else:
> # yarn
> connection_cmd = ["yarn application -status"]
> connection_cmd += [self._driver_id]
> self.log.debug("Poll driver status cmd: %s", connection_cmd)
> return connection_cmd{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)