> On Nov. 30, 2017, 5:33 p.m., András Piros wrote: > > core/src/main/java/org/apache/oozie/service/ZKLocksService.java > > Lines 239 (patched) > > <https://reviews.apache.org/r/63739/diff/2/?file=1901904#file1901904line239> > > > > Yep, blocking a thread for 30-ish minutes doesn't seem like a good > > idea. E.g. ZooKeeper outage lasts for only a few seconds under heavy load - > > it can happen that Oozie JVM ends up blocking lots of `Thread`s in > > `TIMED_WAITING` state. > > > > Can you think of some, more elaborate solution? Maybe having an > > `ExecutorService` just to try to unlock, and `Callable` instances that > > would try unlocking. In that case, the number of `Thread`s blocked in > > `TIMED_WAITING` state would be: > > * configurable > > * limited to a manageable extent
There are couple of cases where ZK outages require Ops team restarting the quorum. From finding the issue to having it restarted, 30 mins would easily pass. Instead of exponential retry, the retry should be every 10 seconds. Retrying after 2,4,8 and 16 minutes is waste of time and blocking thread unnecessarily as you mentioned if ZK is back up. Otherwise it does not matter if retry continues for 30 mins or 1 hour (NN, RM retry time limit is also about the same). Everything in Oozie is going to be totally blocked if ZK is down as you can't acquire locks for any command. Having higher retry time avoids having to restart Oozie. > On Nov. 30, 2017, 5:33 p.m., András Piros wrote: > > core/src/test/java/org/apache/oozie/service/TestZKLocksService.java > > Lines 579-598 (patched) > > <https://reviews.apache.org/r/63739/diff/2/?file=1901906#file1901906line579> > > > > Extract nested class w/ a catchy name. Prefer to disagree on this one. I think it is a overkill to extract into one method and one class for 20 lines of test code. Unless the method is being reused or it is a big chunk of code, it does not make sense to extract into methods and that too in a test class. More methods + more classes != Clean code. It kills readability as well when one has to unnecessarily jump to multiple places to finish reading logic of one method. - Rohini ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/63739/#review192309 ----------------------------------------------------------- On Nov. 27, 2017, 7:22 p.m., Satish Saley wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/63739/ > ----------------------------------------------------------- > > (Updated Nov. 27, 2017, 7:22 p.m.) > > > Review request for oozie. > > > Bugs: OOZIE-3113 > https://issues.apache.org/jira/browse/OOZIE-3113 > > > Repository: oozie-git > > > Description > ------- > > OOZIE-3113 Retry for ZK lock release > > > Diffs > ----- > > core/src/main/java/org/apache/oozie/service/ZKLocksService.java 2c71c001f > core/src/main/resources/oozie-default.xml 8285df0a7 > core/src/test/java/org/apache/oozie/service/TestZKLocksService.java > d04f04e80 > > > Diff: https://reviews.apache.org/r/63739/diff/2/ > > > Testing > ------- > > Tested locally > > > Thanks, > > Satish Saley > >