dlg99 commented on PR #3214:
URL: https://github.com/apache/bookkeeper/pull/3214#issuecomment-1253025650

   @massakam @hangc0276 @eolivelli I spent a bit of time trying to understand 
the root cause and I think I got and idea about what's going on
   .
   This gist https://gist.github.com/dlg99/505849e1010a20c6d439ecd53f500a85  is 
more of a demo of what's going on than actual fix (after all, it breaks 
`testShouldGetTwoFrgamentsIfTwoBookiesFailedInSameEnsemble` - might be a test 
issue).
   
   So the problem is that semaphore acquired in the loop (+ in recursive calls) 
but mostly in the loops like 
   ```
   for (int i = 0; i < writeSet.size(); i++) 
   ```
   and 
   ```
   for (Long entryID: entriesToBeVerified)
   ```
   
   e.g. if writeSet is larger than number of semaphore permits, the loop will 
get stuck because callback will call `checkFragments(`) which calls 
`verifyLedgerFragment()` (recursive call) and then we get deadlock.
   
   @massakam I didn't look if we can guess max number of entriesToBeVerified so 
my workaround was to have a single permit per ReadManyEntriesCallback no matter 
how many entries it reads.
   For the writeset's case callback we can do similar thing with counting down 
number of processed reads.
   This is the simplest solution I could think of but it will not limit bk 
reads in progress, rather segments in progress. 
   
   other approach could be getting rid of recursion and instead use queue, that 
will be more work.
   
   It is possible you can come up with a better solution keeping the root cause 
in mind.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to