[ https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906783#comment-13906783 ]
Srikanth Sundarrajan commented on OOZIE-1533: --------------------------------------------- Currently locks are being held for various coord-action-commands as follows ||Command||Lock (entity-key)|| |CoordActionCheckXCommand|coord-action-id| |CoordActionInfoXCommand|no-locks| |CoordActionInputCheckXCommand|coord-job-id| |CoordActionMaterializeCommand|RANDOM("coord_action_mater" + UUID())| |CoordActionNotificationXCommand|RANDOM("coord_action_notification" + UUID())| |CoordActionReadyXCommand|coord-job-id| |CoordActionsKillXCommand|coord-job-id| |CoordActionStartXCommand|coord-job-id| |CoordActionTimeOutXCommand|coord-action-id| |CoordActionUpdatePushMissingDependency|coord-action-id| |CoordActionUpdateXCommand|coord-job-id| I intend to put up a patch changing locks for the following commands. ||Command||Lock (entity-key)|| |CoordActionInputCheckXCommand|coord-action-id| |CoordActionReadyXCommand|coord-action-id| |CoordActionStartXCommand|coord-action-id| |CoordActionUpdateXCommand|coord-action-id| It seems like these commands were using the coord-job-id level locks to prevent starting the action when the parent coord is in killed or paused state. But from a correctness stand point performing these commands when the coord is in killed / paused state there isn't any impact, except perhaps in CoordActionStartXCommand. While holding lock at the coord-job-id isn't all that helpful as it unnecessarily forces serial execution of independent coord-actions command essentially working on their specific actions. Are there any concerns ? > Coordinator action materialization is too slow due to coarse job level locks > ---------------------------------------------------------------------------- > > Key: OOZIE-1533 > URL: https://issues.apache.org/jira/browse/OOZIE-1533 > Project: Oozie > Issue Type: Improvement > Reporter: Srikanth Sundarrajan > > Coord job level lock introduces high contention. Instead introduce coord > action level locking whenever appropriate -- This message was sent by Atlassian JIRA (v6.1.5#6160)