[ 
https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925120#comment-13925120
 ] 

Rohini Palaniswamy commented on OOZIE-1533:
-------------------------------------------

[~sriksun],
   We wanted to do that as well for performance reasons and Virag had a 
investigation task. But due to other high priority tasks we did not spend much 
time on it and has since been in backlog. 

Based on what I remember, few comments:
 - One problem that needs to be addressed before this was that there are lot of 
places in code where coord job is updated. We often hit problems with 
StatusTransitService updating coord job without a lock and leaving the 
coordinator in a wrong state or invalid state due to race condition with 
another command updating the status at the same time and having us execute 
database queries to restore it to proper state. 
 - CoordActionInputCheckXCommand might not be a good idea as it will put a lot 
of pressure on the namenode and also the Oozie database. Just 2 days back we 
had one user set a high throttle factor of 24 for a bundle with 70 coordinators 
and it brought Oozie down. The first coordinator was marked FAILED as it failed 
due to db read timeout during materialization. All the other coordinators ended 
up with a lot of WAITING actions. Just the amount of update coord_actions set 
last_modified_time where id=x caused such a high load on the db causing lot 
more workflows and coord actions to have errors due to db exceptions. 
 - Another thing is interrupt commands like coord kill, etc will not be 
processed earlier if the lock is changed to the action id.

[~virag],
   I am sure you will have more info to add on other cases to address. Can you 
comment on this when you get time?

> Coordinator action materialization is too slow due to coarse job level locks
> ----------------------------------------------------------------------------
>
>                 Key: OOZIE-1533
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1533
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Srikanth Sundarrajan
>            Assignee: Srikanth Sundarrajan
>              Labels: locking
>         Attachments: OOZIE-1533.patch
>
>
> Coord job level lock introduces high contention. Instead introduce coord 
> action level locking whenever appropriate



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to