[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116218#comment-13116218
 ] 

Sijie Guo commented on BOOKKEEPER-69:
-------------------------------------

h2. 0. investigation

we did some investigation on the ServerRedirectLoopException topic, said 
topic-0.
from zookeeper, topic-0 is owned by perf9.
we dump the hub server jvm of perf8 / perf9. we found that topic-0 is owned by 
both perf8 and perf9.
in perf8, topic-0 is owned in topic manger but not in persistence manager.

h2. 1. Cause 

The ServerRedirectLoopException "Already made the request before to redirected 
host: " is caused by "topic manager own topic but persistence manager doesn't".

if "topic manager own topic", a subscription request will call persistence 
manager to get current seq id of the topic. if the persistence manager doesn't 
has the topic info, persistence manager will throw a 
ServerNotResponsibleForTopicException with *empty redirect host*.

{code:title=BookKeeperPersistenceManager.java|borderStyle=solid}
        TopicInfo topicInfo = topicInfos.get(topic);

        if (topicInfo == null) {
            throw new PubSubException.ServerNotResponsibleForTopicException("");
        }
{code}

hub server will send *NOT_RESPONSIBLE_FOR_TOPIC* to hedwig client.

client handles redirect request, and it found that no host to redirect. it will 
try default server again, but the default server has been in tried server list. 
client throws ServerRedirectLoopException.
                
> ServerRedirectLoopException when a machine (hosts bookie server & hub server) 
> reboot, which is caused by race condition of topic manager
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-69
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-69
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: hedwig-client, hedwig-server
>    Affects Versions: 3.4.0
>         Environment: 3 machines (perf8, perf9, perf10), each machine hosts a 
> bookie server & a hub server.
> perf8 is used as default server for client 1. perf9 is used as default server 
> for client 2.
> bookkeeper is configured as below:
> ensemble size is 3, quorum size is 2.
>            Reporter: Sijie Guo
>            Priority: Critical
>
> 1) machine perf10 is rebooted. the bookie server & hub server are not 
> restarted automatically after reboot.
> 2) client 1 & client 2 are still running. the topics owned in perf10 will be 
> re-assigned to perf8/perf9. but they would fail because not enough bookie 
> servers are available.
> 3) after 2 hours, we found that perf10 is rebooted. we restarted bookie 
> server & hub server on perf10
> 4) then we got ServerRedirectLoopException in client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to