codelipenghui commented on a change in pull request #13596:
URL: https://github.com/apache/pulsar/pull/13596#discussion_r786765120
##########
File path:
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java
##########
@@ -186,6 +188,12 @@ protected void batchOperation(List<MetadataOp> ops) {
}
}
}, null);
+
+ executor.schedule(() -> {
+ if (!callback.get()) {
+ ops.forEach(n -> n.getFuture().completeExceptionally(new
TimeoutException()));
+ }
+ }, getMetadataStoreConfig().getOperationTimeoutSeconds(),
TimeUnit.SECONDS);
Review comment:
Yes, using` future.join()` means the caller expects an infinite timeout
or without a timeout. it's dangerous here to add operation timeout. If the
broker acquired a lock from Zookeeper but the operation timeout happened first,
the callback happened later, how do we deal with this case?
I think the main point is to find the root cause of why the the zk doesn't
call the callback, is the performance bottleneck or deadlock?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]