[jira] [Work logged] (HIVE-22359) LLAP: when a node restarts with the exact same host/port in kubernetes it is not detected as a task failure

ASF GitHub Bot (Jira) Tue, 18 Feb 2020 12:09:07 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-22359?focusedWorklogId=389041&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389041
 ]


ASF GitHub Bot logged work on HIVE-22359:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Feb/20 20:08
            Start Date: 18/Feb/20 20:08
    Worklog Time Spent: 10m 
      Work Description: prasanthj commented on pull request #917: HIVE-22359: 
LLAP: when a node restarts with the exact same host/port in kubernetes it is 
not detected as a task failure
URL: https://github.com/apache/hive/pull/917
 
 
   In kubernete environments, the hostnames and ports are same for LLAP service 
but IP address of pods can change. There are some assumptions in LLAP that 
handles hostname:port and caches connections based on that. Also AM thinks that 
certain host is running some task attempts but when the LLAP pod restarts all 
the tasks on that node gets killed or replaced with new tasks in which case 
LLAP will heartbeat with different task attempts which AM does not expect. 
   
   This PR fixes 2 issues
   - Includes IP address in hostId that is used for caching RPC connections
   - When AM expects some tasks to be there on some node and if does not exists 
then it will kill those task attempts so that it gets rescheduled.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 389041)
    Remaining Estimate: 0h
            Time Spent: 10m

> LLAP: when a node restarts with the exact same host/port in kubernetes it is 
> not detected as a task failure
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-22359
>                 URL: https://issues.apache.org/jira/browse/HIVE-22359
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gopal Vijayaraghavan
>            Assignee: Prasanth Jayachandran
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-22359.1.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> │ <14>1 2019-10-16T22:16:39.233Z 
> query-coordinator-0-5.query-coordinator-0-service.compute-1569601454-l2x9.svc.cluster.local
>  query-coordinator 1 461e5ad9-f05f-11e9-85f7-06e84765763e [mdc@18060 
> class="te │
> │ zplugins.LlapTaskCommunicator" level="INFO" thread="IPC Server handler 4 on 
> 33333"] The tasks we expected to be on the node are not there: 
> attempt_1569601631911_0000_1_04_000034_0, attempt_15696016319 │
> │ 11_0000_1_04_000071_0, attempt_1569601631911_0000_1_04_000191_0, 
> attempt_1569601631911_0000_1_04_000211_0, 
> attempt_1569601631911_0000_1_04_000229_0, 
> attempt_1569601631911_0000_1_04_000231_0, attempt_1 │
> │ 569601631911_0000_1_04_000235_0, attempt_1569601631911_0000_1_04_000242_0, 
> attempt_1569601631911_0000_1_04_000160_1, 
> attempt_1569601631911_0000_1_04_000012_2, 
> attempt_1569601631911_0000_1_04_000003_2, │
> │  attempt_1569601631911_0000_1_04_000056_2, 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22359) LLAP: when a node restarts with the exact same host/port in kubernetes it is not detected as a task failure

Reply via email to