bharanic-dev commented on a change in pull request #13130:
URL: https://github.com/apache/pulsar/pull/13130#discussion_r763194732



##########
File path: 
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java
##########
@@ -317,7 +346,33 @@ public void invalidateAll() {
      */
     protected void execute(Runnable task, CompletableFuture<?> future) {
         try {
-            executor.execute(task);
+            // Wrap the original task, so we can record the thread on which it 
is running
+            TaskWrapper taskWrapper = new TaskWrapper(task);
+            executorWatchDog.execute(() -> {

Review comment:
       w.r.t your comment: "even if a write operation takes 30seconds, the 
thread wouldn't normally get blocked for such long time".
   
   Just want to clarify my own understanding. Today, the metadata-store 
callbacks are invoked by the call to execute method:
   
   
https://github.com/apache/pulsar/blob/master/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java#L318
   
   And the executor is a single-threaded executor. So, the callbacks can only 
be executed one at a time. Like you point out there could be 10s or 100s of 
such tasks.
   
   With my changes, the additional overhead is that the task being submitted to 
the watchdog thread, which in turn delegates to the metadata-store executor. 
So, there is a "wrapper task creation overhead" and additional latency latency. 
But that sholuld not be too bad, compared to the current code? The write 
operation could take as long, but the watchdog executor is not blocked. It only 
gets blocked if the callback task gets blocked (which is the current code as 
well). Is my understanding correct?
   
   And regarding this comment: "When you get the stack trace for the task that 
is blocked, you're only seeing the stack trace of a healthy code path, while 
you won't see the other threads that have caused the deadlock."
   
   While I can't comment on all possible code paths, the specific one in the 
issue 
   
   https://github.com/apache/pulsar/issues/12929
   
   The metadata-store thread stack trace is the most relevant one and that is 
the only one that gets logged in the code I added.
   
   If you still think that "submitting a dummy task periodically" is the way to 
go, I will go work on those changes. But please help clarify the above.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to