lengyuexuexuan opened a new issue, #1880:
URL: https://github.com/apache/incubator-pegasus/issues/1880
Assuming the Pegasus client is configured with a meta server list of
"127.0.0.1:34602" and "127.0.0.1:34603," but the actual primary meta server for
the Pegasus server is "127.0.0.1:34601," the Pegasus client will not be able to
connect to the Pegasus server until a timeout occurs.
The reason is that when the go client searches for the primary, it
iterates through the meta server list, sending an RPC
RPC_CM_QUERY_PARTITION_CONFIG_BY_INDEX to each meta server and making a
determination based on the response.
Unlike the Java client, the go client cannot directly use indirection to
add meta servers not specified in the configuration to the client.
Below is the logic code for this part of the go client.
```
// go-client/session/meta_call.go
func (c *metaCall) issueBackupMetas(ctx context.Context) {
for i := range c.metas {
if i == c.lead {
continue
}
// concurrently issue RPC to the rest of meta servers.
go func(idx int) {
c.issueSingleMeta(ctx, idx)
}(i)
}
}
// issueSingleMeta returns false if we should try another meta
func (c *metaCall) issueSingleMeta(ctx context.Context, i int) bool {
meta := c.metas[i]
resp, err := c.callFunc(ctx, meta)
if err != nil || resp.GetErr().Errno ==
base.ERR_FORWARD_TO_OTHERS.String() {
return false
}
// the RPC succeeds, this meta becomes the new leader now.
atomic.StoreUint32(&c.newLead, uint32(i))
select {
case <-ctx.Done():
case c.respCh <- resp:
// notify the caller
}
return true
}
```
Here is the relevant part of the Java client code for this:
```
// com/xiaomi/infra/pegasus/rpc/async/MetaSession.java onFinishQueryMeta()
synchronized (this) {
if (needSwitchLeader) {
if (forwardAddress != null && !forwardAddress.isInvalid()) {
boolean found = false;
for (int i = 0; i < metaList.size(); i++) {
if (metaList.get(i).getAddress().equals(forwardAddress)) {
curLeader = i;
found = true;
break;
}
}
if (!found) {
logger.info("add forward address {} as meta server", forwardAddress);
metaList.add(clusterManager.getReplicaSession(forwardAddress));
curLeader = metaList.size() - 1;
}
} else if (metaList.get(curLeader) == round.lastSession) {
curLeader = (curLeader + 1) % metaList.size();
if (curLeader == 0 && hostPort != null && round.maxResolveCount != 0) {
resolveHost(hostPort);
round.maxResolveCount--;
round.maxExecuteCount = metaList.size();
}
}
}
round.lastSession = metaList.get(curLeader);
}
```
### In summary
The primary impact of this issue is that, in the online cluster, a new meta
server was added, and at some point thereafter, this meta server became the
primary. Users, without changing their configurations, are unable to connect to
the server.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]