[ https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990407#comment-13990407 ]
Virag Kothari commented on OOZIE-1533: -------------------------------------- bq. The de-duping for queue is on action id for CoordActionInputCheckXCommand. So, 10,000 actions will be in the queue even with coord job lock. You are right. But, with coord job lock, only 1 command for a job can execute at a time. With action locks, the handler threads may remain busy in working on commands belonging to only 1 job if that job has lot of actions. This will be unfair to other jobs with less actions. The more acute problem is with system correctness as commands like kill, suspend update both job and action during their lifecycle. For. e.g if a command such as kill acquires lock on coordjob and terminates all actions by moving them to KILLED, a InputCheckX with lock on actionId can execute simultaneously and may move the action to READY state. The CoordReady may eventually fail as the job is in KILLED state, but the action is inadvertently in READY state. Ideally, all commands should execute within ms, so even though they execute using a job lock it should be very fast. Are you sure the slowness is due to job locks? At Y!, we have seen more slowness due to DB issues usually masking the delay caused by job locks. > Coordinator action materialization is too slow due to coarse job level locks > ---------------------------------------------------------------------------- > > Key: OOZIE-1533 > URL: https://issues.apache.org/jira/browse/OOZIE-1533 > Project: Oozie > Issue Type: Improvement > Reporter: Srikanth Sundarrajan > Assignee: Srikanth Sundarrajan > Labels: locking > Attachments: OOZIE-1533.patch > > > Coord job level lock introduces high contention. Instead introduce coord > action level locking whenever appropriate -- This message was sent by Atlassian JIRA (v6.2#6252)