[ https://issues.apache.org/jira/browse/HIVE-15958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879895#comment-15879895 ]
Siddharth Seth commented on HIVE-15958: --------------------------------------- I suspect the following is the reason for connections not being re-used for taskKilled. For regular heartbeats, only one session will ever run for an AM - and this is controlled via the QueueCallable / HeartbeatCallable. When taskKilled comes into play, it is possible for a taskKilled to get a handle on the umbilical, and have one of the queued threads close the umbilical right after that, resulting in an error. We have that situation again. More prominent now - since queryComplete causes fragments to be killed (should probably not be done - HIVE-16021), which in turn result in a heartbeat. The queryComplete closes the umbilical, while taskKilled requests get scheduled. Also, iterating over the knownAppMasters is very avoidable. We can store information about the AM in the queryTracker, and retrieve it on queryComplete. Alternately send the AM information on the queryComplete call. > LLAP: IPC connections are not being reused for umbilical protocol > ----------------------------------------------------------------- > > Key: HIVE-15958 > URL: https://issues.apache.org/jira/browse/HIVE-15958 > Project: Hive > Issue Type: Bug > Components: llap > Affects Versions: 2.2.0 > Reporter: Rajesh Balamohan > Assignee: Prasanth Jayachandran > Attachments: HIVE-15958.1.patch, HIVE-15958.2.patch > > > During concurrency testing, observed 1000s of ipc thread creations. Ideally, > the connections to same hosts should be reused. -- This message was sent by Atlassian JIRA (v6.3.15#6346)