[ https://issues.apache.org/jira/browse/HIVE-16984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shohei Okumiya updated HIVE-16984: ---------------------------------- Fix Version/s: NA Resolution: Won't Fix Status: Resolved (was: Patch Available) We have discontinued Hive on Spark and EoLed Hive 3. HIVE-26134 > HoS: avoid waiting for RemoteSparkJobStatus::getAppID() when remote driver > died > ------------------------------------------------------------------------------- > > Key: HIVE-16984 > URL: https://issues.apache.org/jira/browse/HIVE-16984 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Chao Sun > Assignee: Chao Sun > Priority: Major > Fix For: NA > > Attachments: HIVE-16984.1.patch > > > In HoS, after a RemoteDriver is launched, it may fail to initialize a Spark > context and thus the ApplicationMaster will die eventually. In this case, > there are two issues related to RemoteSparkJobStatus::getAppID(): > 1. Currently we call {{getAppID()}} before starting the monitoring job. For > the first, it will wait for {{hive.spark.client.future.timeout}}, and for the > latter, it will wait for {{hive.spark.job.monitor.timeout}}. The error > message for the latter treats the {{hive.spark.job.monitor.timeout}} as the > time waiting for the job submission. However, this is inaccurate as it > doesn't include {{hive.spark.client.future.timeout}}. > 2. In case the RemoteDriver suddenly died, currently we still may wait > hopelessly for the timeouts. This should potentially be avoided if we know > that the channel has closed between the client and remote driver. -- This message was sent by Atlassian Jira (v8.20.10#820010)