[ 
https://issues.apache.org/jira/browse/AIRFLOW-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Albertus Kelvin updated AIRFLOW-6249:
-------------------------------------
    Description: 
I think the driver status tracker should also be enabled for yarn, mesos & 
kubernetes with cluster deploy mode.

Standalone, mesos & k8s cluster deploy mode: According to 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L543],
 using *--status* parameter is supported for these cluster managers.

YARN cluster deploy mode: I think we can use built-in commands from yarn 
itself, such as *yarn application -status <ApplicationID>*.

Therefore, the *_build_track_driver_status_command* method should be updated 
accordingly to accommodate such a need, such as the following.
{code:python}
def _build_track_driver_status_command(self):
        # The driver id so we can poll for its status
        if not self._driver_id:
            raise AirflowException(
                    "Invalid status: attempted to poll driver " +
                    "status but no driver id is known. Giving up.")

        schemes = ("spark://", "mesos://", "k8s://https://";)
        if self._connection['master'].startswith(schemes): 
            # standalone, mesos, kubernetes
            connection_cmd = self._get_spark_binary_path()
            connection_cmd += ["--master", self._connection['master']]
            connection_cmd += ["--status", self._driver_id]
        else:
            # yarn
            connection_cmd = ["yarn application -status"]
            connection_cmd += [self._driver_id]

        self.log.debug("Poll driver status cmd: %s", connection_cmd)

        return connection_cmd{code}

  was:
I think the driver status tracker should also be enabled for yarn, mesos & 
kubernetes with cluster deploy mode.

Standalone, mesos & k8s cluster deploy mode: According to 
[this|[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L543]],
 *--status* parameter supports these cluster managers.

YARN cluster deploy mode: I think we can use built-in commands from yarn 
itself, such as *yarn application -status <ApplicationID>*.

Therefore, the *_build_track_driver_status_command* method should be updated 
accordingly to accommodate such a need, such as the following.
{code:python}
def _build_track_driver_status_command(self):
        # The driver id so we can poll for its status
        if not self._driver_id:
            raise AirflowException(
                    "Invalid status: attempted to poll driver " +
                    "status but no driver id is known. Giving up.")

        schemes = ("spark://", "mesos://", "k8s://https://";)
        if self._connection['master'].startswith(schemes): 
            # standalone, mesos, kubernetes
            connection_cmd = self._get_spark_binary_path()
            connection_cmd += ["--master", self._connection['master']]
            connection_cmd += ["--status", self._driver_id]
        else:
            # yarn
            connection_cmd = ["yarn application -status"]
            connection_cmd += [self._driver_id]

        self.log.debug("Poll driver status cmd: %s", connection_cmd)

        return connection_cmd{code}


> Spark driver status polling for standalone, YARN, Mesos and K8s
> ---------------------------------------------------------------
>
>                 Key: AIRFLOW-6249
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6249
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hooks, operators
>    Affects Versions: 1.10.6
>            Reporter: Albertus Kelvin
>            Priority: Minor
>
> I think the driver status tracker should also be enabled for yarn, mesos & 
> kubernetes with cluster deploy mode.
> Standalone, mesos & k8s cluster deploy mode: According to 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L543],
>  using *--status* parameter is supported for these cluster managers.
> YARN cluster deploy mode: I think we can use built-in commands from yarn 
> itself, such as *yarn application -status <ApplicationID>*.
> Therefore, the *_build_track_driver_status_command* method should be updated 
> accordingly to accommodate such a need, such as the following.
> {code:python}
> def _build_track_driver_status_command(self):
>         # The driver id so we can poll for its status
>         if not self._driver_id:
>             raise AirflowException(
>                     "Invalid status: attempted to poll driver " +
>                     "status but no driver id is known. Giving up.")
>         schemes = ("spark://", "mesos://", "k8s://https://";)
>         if self._connection['master'].startswith(schemes): 
>             # standalone, mesos, kubernetes
>             connection_cmd = self._get_spark_binary_path()
>             connection_cmd += ["--master", self._connection['master']]
>             connection_cmd += ["--status", self._driver_id]
>         else:
>             # yarn
>             connection_cmd = ["yarn application -status"]
>             connection_cmd += [self._driver_id]
>         self.log.debug("Poll driver status cmd: %s", connection_cmd)
>         return connection_cmd{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to