> Is this bug regression or is it been like this since beginning?
It was always there
> Because of this deadlock is it just 'checkAllLedgers' checker which is
> blocked? or other components which use 'executor' ("auditBookies" checker and
> core Auditor functionality as well?
The ZK "event-thread" is blocked, so nothing else using ZK will work.
> If synchronous call - 'admin.openLedgerNoRecovery' in
> "checkLedgersProcessor" is blocked as you explained, then 'processDone' latch
> is not counted down, then "processDone.await()" in "checkAllLedgers" will be
> blocked forever. Which will make 'executor' blocked, since 'executor' is
> singleThreadScheduledExecutor, then IFIUC all of the Auditor functionality is
> blocked, right?
> why does Issue description say "Auditor run Periodic check only once", if
> the analysis made for this fix is correct then "checkAllLedgers" shouldn't
> run even once right?
I think the issue was named (not by me) based on the initial perceived
behavior. The analysis of the stack-trace is pretty clear on what the root
problem is.
It is a big problem to mix sync and async operation in ZK. It is imperative to
not do anything blocking from a ZK callback thread.
> To begin with, I'm not sure if there is comprehensive testcase for this
> checker, but I'm little surprised that this commit is merged / issue is
> closed, with no testcase to prove the analysis of the fix and validness of
> the fix.
[ Full content available at: https://github.com/apache/bookkeeper/pull/1608 ]
This message was relayed via gitbox.apache.org for [email protected]