> On Nov. 30, 2017, 9:33 a.m., András Piros wrote: > > core/src/main/java/org/apache/oozie/service/ZKLocksService.java > > Lines 239 (patched) > > <https://reviews.apache.org/r/63739/diff/2/?file=1901904#file1901904line239> > > > > Yep, blocking a thread for 30-ish minutes doesn't seem like a good > > idea. E.g. ZooKeeper outage lasts for only a few seconds under heavy load - > > it can happen that Oozie JVM ends up blocking lots of `Thread`s in > > `TIMED_WAITING` state. > > > > Can you think of some, more elaborate solution? Maybe having an > > `ExecutorService` just to try to unlock, and `Callable` instances that > > would try unlocking. In that case, the number of `Thread`s blocked in > > `TIMED_WAITING` state would be: > > * configurable > > * limited to a manageable extent > > Rohini Palaniswamy wrote: > There are couple of cases where ZK outages require Ops team restarting > the quorum. From finding the issue to having it restarted, 30 mins would > easily pass. Instead of exponential retry, the retry should be every 10 > seconds. Retrying after 2,4,8 and 16 minutes is waste of time and blocking > thread unnecessarily as you mentioned if ZK is back up. Otherwise it does not > matter if retry continues for 30 mins or 1 hour (NN, RM retry time limit is > also about the same). Everything in Oozie is going to be totally blocked if > ZK is down as you can't acquire locks for any command. Having higher retry > time avoids having to restart Oozie.
If ZooKeeper outages lasts for only a few seconds then the thread won't be blocked for 30 minutes. When it wakes up from sleep, it will release the lock successfully. > On Nov. 30, 2017, 9:33 a.m., András Piros wrote: > > core/src/test/java/org/apache/oozie/service/TestZKLocksService.java > > Lines 563 (patched) > > <https://reviews.apache.org/r/63739/diff/2/?file=1901906#file1901906line563> > > > > What about `Services.get().get(ZKLocksService.class)`? Similar to other test cases in this class because we are not initializing ZKLocksService in setup method. > On Nov. 30, 2017, 9:33 a.m., András Piros wrote: > > core/src/test/java/org/apache/oozie/service/TestZKLocksService.java > > Lines 567-574 (patched) > > <https://reviews.apache.org/r/63739/diff/2/?file=1901906#file1901906line567> > > > > Extract method `setupSilentLock()`. Overkill, dropping as Rohini suggested below. - Satish ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/63739/#review192309 ----------------------------------------------------------- On Nov. 27, 2017, 11:22 a.m., Satish Saley wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/63739/ > ----------------------------------------------------------- > > (Updated Nov. 27, 2017, 11:22 a.m.) > > > Review request for oozie. > > > Bugs: OOZIE-3113 > https://issues.apache.org/jira/browse/OOZIE-3113 > > > Repository: oozie-git > > > Description > ------- > > OOZIE-3113 Retry for ZK lock release > > > Diffs > ----- > > core/src/main/java/org/apache/oozie/service/ZKLocksService.java 2c71c001f > core/src/main/resources/oozie-default.xml 8285df0a7 > core/src/test/java/org/apache/oozie/service/TestZKLocksService.java > d04f04e80 > > > Diff: https://reviews.apache.org/r/63739/diff/2/ > > > Testing > ------- > > Tested locally > > > Thanks, > > Satish Saley > >