ocean30 opened a new issue, #2249:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/2249
## Bug Report
elastic-job-lite-core:2.0.4 When using digest to access zookeeper, the
server has been waiting (fake death).
### Which version of ElasticJob did you use?
elastic-job-lite-core:2.0.4
### Expected behavior
LeaderLatch election ends
### Actual behavior
LeaderLatch has been waiting for the election to end
### Reason analyze
After ElasticJob adds digest, when an election occurs, a child node with
digests will be created under the latch node. The server that has not been
upgraded does not have digest, resulting in no permission to access the newly
generated node, which eventually leads to election waiting.
Zookeeper server returns type=GET_DATA, resultCode=-102 , but GET_DATA does
not process resultCode=-102 when processResult.
### Steps to reproduce the behavior.
1. 8 servers, 10 jobs, no digest running
2. Add digest to the code, restart the server one by one, some machines will
wait forever (fake death)
### Log and code
1、elasticjob log data zookeeper response type=GET_DATA data:
`2023-07-27 22:33:55.224 [DEBUG] [localhost-startStop-1-EventThread] [] []
[] - o.a.c.f.r.c.TreeCache : processResult: CuratorEventImpl{type=GET_DATA,
resultCode=-102,
path='/xxx/leader/election/latch/_c_bccccdcc-1134-4e0a-bb52-59a13836434a-latch-0000000047',
name='null', children=null, context=null, stat=null, data=null,
watchedEvent=null, aclList=null}`
2、elasticjob process zookeeper type=GET_DATA result,Only the logic that
returned the result of OK (0) and NONODE (-101) was processed, while the
processing of the returned result of NoAuth (-102) was ignored, so it caused
the election to wait and no exception was thrown.
org.apache.curator.framework.recipes.cache.TreeCache.TreeNode#processResult
```
public void processResult(CuratorFramework client, CuratorEvent event)
throws Exception
{
LOG.debug("processResult: {}", event);
Stat newStat = event.getStat();
switch ( event.getType() )
{
case EXISTS:
{...}
break;
case CHILDREN:
{...}
break;
case GET_DATA:
if ( event.getResultCode() ==
KeeperException.Code.OK.intValue() )*
{...}
else if ( event.getResultCode() ==
KeeperException.Code.NONODE.intValue() )
{
wasDeleted();
}
//todo need process NoAuth (-102)
break;
default:
// An unknown event, probably an error of some sort like
connection loss.
LOG.info(String.format("Unknown event %s", event));
// Don't produce an initialized event on error; reconnect
can fix this.
outstandingOps.decrementAndGet();
return;
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
[email protected]
For queries about this service, please contact Infrastructure at:
[email protected]