ocean30 opened a new issue, #2249:
URL: https://github.com/apache/shardingsphere-elasticjob/issues/2249

   ## Bug Report
   elastic-job-lite-core:2.0.4 When using digest to access zookeeper, the 
server has been waiting (fake death).
   
   ### Which version of ElasticJob did you use?
   elastic-job-lite-core:2.0.4
   ### Expected behavior
   LeaderLatch election ends
   ### Actual behavior
   LeaderLatch has been waiting for the election to end
   ### Reason analyze
   After ElasticJob adds digest, when an election occurs, a child node with 
digests will be created under the latch node. The server that has not been 
upgraded does not have digest, resulting in no permission to access the newly 
generated node, which eventually leads to election waiting.
   Zookeeper server returns type=GET_DATA, resultCode=-102 , but GET_DATA does 
not process resultCode=-102 when processResult.
   ### Steps to reproduce the behavior.
   1. 8 servers, 10 jobs, no digest running
   2. Add digest to the code, restart the server one by one, some machines will 
wait forever (fake death)
   
   ### Log and code 
   1、elasticjob log data zookeeper response  type=GET_DATA data:
   `2023-07-27 22:33:55.224 [DEBUG] [localhost-startStop-1-EventThread] [] [] 
[] - o.a.c.f.r.c.TreeCache : processResult: CuratorEventImpl{type=GET_DATA, 
resultCode=-102, 
path='/xxx/leader/election/latch/_c_bccccdcc-1134-4e0a-bb52-59a13836434a-latch-0000000047',
 name='null', children=null, context=null, stat=null, data=null, 
watchedEvent=null, aclList=null}`
   2、elasticjob process zookeeper  type=GET_DATA result,Only the logic that 
returned the result of OK (0) and NONODE (-101) was processed, while the 
processing of the returned result of NoAuth (-102) was ignored, so it caused 
the election to wait and no exception was thrown.
   org.apache.curator.framework.recipes.cache.TreeCache.TreeNode#processResult
   ```
   public void processResult(CuratorFramework client, CuratorEvent event) 
throws Exception
           {
               LOG.debug("processResult: {}", event);
               Stat newStat = event.getStat();
               switch ( event.getType() )
               {
               case EXISTS:
                   {...}
                   break;
               case CHILDREN:
                   {...}
                   break;
               case GET_DATA:
                   if ( event.getResultCode() == 
KeeperException.Code.OK.intValue() )*
                   {...}
                   else if ( event.getResultCode() == 
KeeperException.Code.NONODE.intValue() )
                   {
                       wasDeleted();
                   }
                  //todo need process NoAuth (-102) 
                   break;
               default:
                   // An unknown event, probably an error of some sort like 
connection loss.
                   LOG.info(String.format("Unknown event %s", event));
                   // Don't produce an initialized event on error; reconnect 
can fix this.
                   outstandingOps.decrementAndGet();
                   return;
               }
   
           }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to