mahesh kumar behera updated HIVE-19924:
    Attachment: HIVE-19924.13.patch

> Tag distcp jobs run by Repl Load
> --------------------------------
>                 Key: HIVE-19924
>                 URL: https://issues.apache.org/jira/browse/HIVE-19924
>             Project: Hive
>          Issue Type: Task
>          Components: repl
>    Affects Versions: 3.1.0, 4.0.0
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>              Labels: DR, replication
>             Fix For: 4.0.0, 3.2.0
>         Attachments: HIVE-19924.01.patch, HIVE-19924.02.patch, 
> HIVE-19924.03.patch, HIVE-19924.04.patch, HIVE-19924.05.patch, 
> HIVE-19924.06.patch, HIVE-19924.07.patch, HIVE-19924.08.patch, 
> HIVE-19924.09.patch, HIVE-19924.10.patch, HIVE-19924.11.patch, 
> HIVE-19924.12.patch, HIVE-19924.13.patch
> Add tags in jobconf for distcp related jobs started by replication. This will 
> allow hive to kill these jobs in case beacon retries, or hs2 dies and beacon 
> issues a kill command.
>  * one of the tags should definitely be the query_id that starts the job : 
> With this flow beacon before retrying the bootstrap load, will issue a kill 
> command to hs2 with the query id of the previous issued command. hs2 will 
> then kill an running jobs on yarn tagged with the Query_id.
>  * To get around the additional failure point as mentioned above. The jobs 
> can be tagged with an additional unique tag_id provided by Beacon in the WITH 
> clause in repl load command to be used to tag distcp jobs ). Enhance the kill 
> api to take the tag as input and kill jobs associated with that tag. Problem 
> here is how do we validate the association of the tag with a hive query id to 
> make sure this api is not used to kill jobs run by other components, however 
> we can provide this capability to only admins and should be ok in that case.

This message was sent by Atlassian JIRA

Reply via email to