Github user BrainLogic commented on the issue:
https://github.com/apache/flink/pull/3035
Distributed systems and multithreading environments make us think in term
of logical clock, like Lamport clock, step by step:
Thread1 - fetcher is running `running = true`
Thread2 performs `zkHandler.prepareAndCommitOffsets(offsets)`
Thread2 `running = true` and `zkHandler.prepareAndCommitOffsets(offsets)`
throws an exception
Thread1 stop fetcher and change flag `running = false` in normal way
without any exception
Thread2 read the flag `running = false` and `return` although, there is a
reason of the commit failure that is different from "Client was closed and
running= false"
Thread1 fetcher is stopped successfully
There is no exception or any information in log regarding the exception
from `zkHandler.prepareAndCommitOffsets(offsets)`
Again this is a rare case when we can lose a root cause of strange behavior
- offset will not be committed although there are no any exceptions. Am I wrong?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---