[
https://issues.apache.org/jira/browse/BOOKKEEPER-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116227#comment-13116227
]
Sijie Guo commented on BOOKKEEPER-69:
-------------------------------------
h4. why topic manager own topic but persistence manager doesn't own topic.
first, the topic acquisition flow and release topic flow when exception are
stated as below:
acquireTopic:
{quote}
a) check topics list, if the topic is in topic set owned by this hub server,
return itself.
b) then, read the zk server to know who is the owner of the topic (async)
i) if the node existed and the owner is not itself, return the owner and
send a redirect response to client.
ii) if the node existed and the owner is itself, then *it consider the zk
node is stale. topic manager will delete it and go to c)*
iii) if the node is not existed go to c)
c) choose or claim a topic.
i) if the topic manager succeed to acquire the topic, it will call
listeners (persistence/subscriptions/region manager) to do their topic
acquisition logic. (async)
a) if all the listeners succeed to acquire topic, topic manager add the
topic to topic set.
b) if one of the listeners failed, topic manager starts to do
'lostTopic' logic.
{quote}
lostTopic:
{quote}
a) first topic manager removes topic from topic set.
b) ask all the listeners (persistence/subscriptions/region manager) to do
'lostTopic' logic (just put a releaseOp in their queues).
{quote}
In our case, there are some parallel subscribe requests sent during we restart
bookkeeper server.
For ease, we assume 'sub-1' and 'sub-2' subscribe 'topic-0' in default server A
at the same time.
1) two topic acquire ops will be executed asynchronously.
2) sub-1 acquire op succeed to claim the hub server as the topic owner ( in c)
) (the owner zk node is created). And it call listeners
(persistence/subscriptions/region manager) to do their topic acquisition logic.
these manager will just put an acquire op in their queues, as below.
{quote}
> Persistence Manager <
| sub-1 acquire op |
{quote}
3) then sub-2 acquire op executed. since sub-1 doesn't callback, so the topic
will not in topic set. sub-2 will got to b)-ii). sub-2 deletes the zk node
created by sub-1. and sub-2 does same logic as sub-1 in 2)
{quote}
> Persistence Manager <
| sub-1 acquire op |
| sub-2 acquire op |
{quote}
4) if bookkeeper restart from failure between sub-1 acquire op execution and
sub-2 acquire op execution in persistence manager. sub-1 acquire op will fail
due to NotEnoughBookieException. sub-1 will enter lost topic logic. A release
op will be added in persistence manager queue.
{quote}
> Persistence Manager <
| sub-2 acquire op |
| sub-1 release op |
{quote}
5) sub-2 acquire op is executed successfully, because the bookkeeper server has
been restarted. the topic will be added to topic manager's topic set.
{quote}
> Persistence Manager <
| sub-1 release op |
{quote}
6) sub-1 release op is executed. it deletes the topic info in persistence
manager. so we have *the topic in topic manager but not in persistence
manager*, which is dangerous.
Even worse, if we have multi default servers (actually in our testing, we
have!), we have other servers trying to claim the same topic. Assume sub-3
communicates with default server B to subscribe topic-0.
in 3) of the above flow, after sub-2 delete the zk node created by sub-1. sub-3
in default server B and sub-1 in default server A have same chance to acquire
topic-0. If sub-3 in default server B succeed, then we in a worse status:
*server A considers itself as owner since the topic is in its topic set, also
the same as server B. in zookeeper the owner is server B*. (actually we indeed
in this status in our testing!)
> ServerRedirectLoopException when a machine (hosts bookie server & hub server)
> reboot, which is caused by race condition of topic manager
> ----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: BOOKKEEPER-69
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-69
> Project: Bookkeeper
> Issue Type: Bug
> Components: hedwig-client, hedwig-server
> Affects Versions: 3.4.0
> Environment: 3 machines (perf8, perf9, perf10), each machine hosts a
> bookie server & a hub server.
> perf8 is used as default server for client 1. perf9 is used as default server
> for client 2.
> bookkeeper is configured as below:
> ensemble size is 3, quorum size is 2.
> Reporter: Sijie Guo
> Priority: Critical
>
> 1) machine perf10 is rebooted. the bookie server & hub server are not
> restarted automatically after reboot.
> 2) client 1 & client 2 are still running. the topics owned in perf10 will be
> re-assigned to perf8/perf9. but they would fail because not enough bookie
> servers are available.
> 3) after 2 hours, we found that perf10 is rebooted. we restarted bookie
> server & hub server on perf10
> 4) then we got ServerRedirectLoopException in client.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira