> Is this bug regression or is it been like this since beginning?

It was always there

>    Because of this deadlock is it just 'checkAllLedgers' checker which is 
> blocked? or other components which use 'executor' ("auditBookies" checker and 
> core Auditor functionality as well?

The ZK "event-thread" is blocked, so nothing else using ZK will work.

>    If synchronous call - 'admin.openLedgerNoRecovery' in 
> "checkLedgersProcessor" is blocked as you explained, then 'processDone' latch 
> is not counted down, then "processDone.await()" in "checkAllLedgers" will be 
> blocked forever. Which will make 'executor' blocked, since 'executor' is 
> singleThreadScheduledExecutor, then IFIUC all of the Auditor functionality is 
> blocked, right?

>    why does Issue description say "Auditor run Periodic check only once", if 
> the analysis made for this fix is correct then "checkAllLedgers" shouldn't 
> run even once right?

I think the issue was named (not by me) based on the initial perceived 
behavior. The analysis of the stack-trace is pretty clear on what the root 
problem is.

It is a big problem to mix sync and async operation in ZK. It is imperative to 
not do anything blocking from a ZK callback thread.

>    To begin with, I'm not sure if there is comprehensive testcase for this 
> checker, but I'm little surprised that this commit is merged / issue is 
> closed, with no testcase to prove the analysis of the fix and validness of 
> the fix.



[ Full content available at: https://github.com/apache/bookkeeper/pull/1608 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to