[
https://issues.apache.org/jira/browse/HIVE-19924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
mahesh kumar behera updated HIVE-19924:
---------------------------------------
Attachment: HIVE-19924.01.patch
> Tag distcp jobs run by Repl Load
> --------------------------------
>
> Key: HIVE-19924
> URL: https://issues.apache.org/jira/browse/HIVE-19924
> Project: Hive
> Issue Type: Task
> Components: repl
> Affects Versions: 3.1.0, 4.0.0
> Reporter: mahesh kumar behera
> Assignee: mahesh kumar behera
> Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19924.01.patch
>
>
> Add tags in jobconf for distcp related jobs started by replication. This will
> allow hive to kill these jobs in case beacon retries, or hs2 dies and beacon
> issues a kill command.
> * one of the tags should definitely be the query_id that starts the job :
> With this flow beacon before retrying the bootstrap load, will issue a kill
> command to hs2 with the query id of the previous issued command. hs2 will
> then kill an running jobs on yarn tagged with the Query_id.
> * To get around the additional failure point as mentioned above. The jobs
> can be tagged with an additional unique tag_id provided by Beacon in the WITH
> clause in repl load command to be used to tag distcp jobs ). Enhance the kill
> api to take the tag as input and kill jobs associated with that tag. Problem
> here is how do we validate the association of the tag with a hive query id to
> make sure this api is not used to kill jobs run by other components, however
> we can provide this capability to only admins and should be ok in that case.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)