[ 
https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990407#comment-13990407
 ] 

Virag Kothari commented on OOZIE-1533:
--------------------------------------

bq. The de-duping for queue is on action id for CoordActionInputCheckXCommand. 
So, 10,000 actions will be in the queue even with coord job lock.

You are right. But, with coord job lock, only 1 command for a job can execute 
at a time. With action locks, the handler threads may remain busy in working on 
commands belonging to only 1 job if that job has lot of actions. This will be 
unfair to other jobs with less actions.
The more acute problem is with system correctness as commands like kill, 
suspend update both job and action during their lifecycle. For. e.g if a 
command such as kill acquires lock on coordjob and terminates all actions by 
moving them to KILLED, a InputCheckX with lock on actionId can execute 
simultaneously and may move the action to READY state. The CoordReady may 
eventually fail as the job is in KILLED state, but the action is  inadvertently 
in READY state.

Ideally, all commands should execute within ms, so even though they execute 
using a job lock it should be very fast. Are you sure the slowness is due to 
job locks? At Y!, we have seen more slowness due to DB issues usually masking 
the delay caused by job locks.

> Coordinator action materialization is too slow due to coarse job level locks
> ----------------------------------------------------------------------------
>
>                 Key: OOZIE-1533
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1533
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Srikanth Sundarrajan
>            Assignee: Srikanth Sundarrajan
>              Labels: locking
>         Attachments: OOZIE-1533.patch
>
>
> Coord job level lock introduces high contention. Instead introduce coord 
> action level locking whenever appropriate



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to