bharanic-dev commented on a change in pull request #13130:
URL: https://github.com/apache/pulsar/pull/13130#discussion_r763194732
##########
File path:
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java
##########
@@ -317,7 +346,33 @@ public void invalidateAll() {
*/
protected void execute(Runnable task, CompletableFuture<?> future) {
try {
- executor.execute(task);
+ // Wrap the original task, so we can record the thread on which it
is running
+ TaskWrapper taskWrapper = new TaskWrapper(task);
+ executorWatchDog.execute(() -> {
Review comment:
w.r.t your comment: "even if a write operation takes 30seconds, the
thread wouldn't normally get blocked for such long time".
Just want to clarify my own understanding. Today, the metadata-store
callbacks are invoked by the call to execute method:
https://github.com/apache/pulsar/blob/master/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java#L318
And the executor is a single-threaded executor. So, the callbacks can only
be executed one at a time. Like you point out there could be 10s or 100s of
such tasks.
With my changes, the additional overhead is that the task being submitted to
the watchdog thread, which in turn delegates to the metadata-store executor.
So, there is a "wrapper task creation overhead" and additional latency latency.
But that sholuld not be too bad, compared to the current code? The write
operation could take as long, but the watchdog executor is not blocked. It only
gets blocked if the callback task gets blocked (which is the current code as
well). Is my understanding correct?
And regarding this comment: "When you get the stack trace for the task that
is blocked, you're only seeing the stack trace of a healthy code path, while
you won't see the other threads that have caused the deadlock."
While I can't comment on all possible code paths, the specific one in the
issue
https://github.com/apache/pulsar/issues/12929
The metadata-store thread stack trace is the most relevant one and that is
the only one that gets logged in the code I added.
If you still think that "submitting a dummy task periodically" is the way to
go, I will go work on those changes. But please help clarify the above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]