[
https://issues.apache.org/jira/browse/HIVE-16984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chao Sun reassigned HIVE-16984:
-------------------------------
> HoS: avoid waiting for RemoteSparkJobStatus::getAppID() when remote driver
> died
> -------------------------------------------------------------------------------
>
> Key: HIVE-16984
> URL: https://issues.apache.org/jira/browse/HIVE-16984
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Chao Sun
> Assignee: Chao Sun
>
> In HoS, after a RemoteDriver is launched, it may fail to initialize a Spark
> context and thus the ApplicationMaster will die eventually. In this case,
> there are two issues related to RemoteSparkJobStatus::getAppID():
> 1. Currently we call {{getAppID()}} before starting the monitoring job. For
> the first, it will wait for {{hive.spark.client.future.timeout}}, and for the
> latter, it will wait for {{hive.spark.job.monitor.timeout}}. The error
> message for the latter treats the {{hive.spark.job.monitor.timeout}} as the
> time waiting for the job submission. However, this is inaccurate as it
> doesn't include {{hive.spark.client.future.timeout}}.
> 2. In case the RemoteDriver suddenly died, currently we still may wait
> hopelessly for the timeouts. This should potentially be avoided if we know
> that the channel has closed between the client and remote driver.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)