[ 
https://issues.apache.org/jira/browse/HIVE-16984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-16984:
----------------------------
    Attachment: HIVE-16984.1.patch

> HoS: avoid waiting for RemoteSparkJobStatus::getAppID() when remote driver 
> died
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-16984
>                 URL: https://issues.apache.org/jira/browse/HIVE-16984
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>         Attachments: HIVE-16984.1.patch
>
>
> In HoS, after a RemoteDriver is launched, it may fail to initialize a Spark 
> context and thus the ApplicationMaster will die eventually. In this case, 
> there are two issues related to RemoteSparkJobStatus::getAppID():
> 1. Currently we call {{getAppID()}} before starting the monitoring job. For 
> the first, it will wait for {{hive.spark.client.future.timeout}}, and for the 
> latter, it will wait for {{hive.spark.job.monitor.timeout}}. The error 
> message for the latter treats the {{hive.spark.job.monitor.timeout}} as the 
> time waiting for the job submission. However, this is inaccurate as it 
> doesn't include {{hive.spark.client.future.timeout}}.
> 2. In case the RemoteDriver suddenly died, currently we still may wait 
> hopelessly for the timeouts. This should potentially be avoided if we know 
> that the channel has closed between the client and remote driver.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to