crazycarry opened a new issue #4226:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4226


   
   ### when i upgrade ds to 1.3.3, i find a bug like before ,in the class 
MaterSchedulerService
   
   ```
   
      public void run() {
           logger.info("master scheduler started");
           while (Stopper.isRunning()){
               InterProcessMutex mutex = null;
               try {
                   boolean runCheckFlag = 
OSUtils.checkResource(masterConfig.getMasterMaxCpuloadAvg(), 
masterConfig.getMasterReservedMemory());
                   if(!runCheckFlag) {
                       Thread.sleep(Constants.SLEEP_TIME_MILLIS);
                       continue;
                   }
                   if (zkMasterClient.getZkClient().getState() == 
CuratorFrameworkState.STARTED) {
   
                       mutex = zkMasterClient.blockAcquireMutex();
   
                       int activeCount = masterExecService.getActiveCount();
                       // make sure to scan and delete command  table in one 
transaction
                       Command command = processService.findOneCommand();
                       if (command != null) {
                           logger.info("find one command: id: {}, type: {}", 
command.getId(),command.getCommandType());
   
                           try{
   
                               ProcessInstance processInstance = 
processService.handleCommand(logger,
                                       getLocalAddress(),
                                       this.masterConfig.getMasterExecThreads() 
- activeCount, command);
                               if (processInstance != null) {
                                   logger.info("start master exec thread , 
split DAG ...");
                                   masterExecService.execute(new 
MasterExecThread(processInstance, processService, nettyRemotingClient));
                               }
                           }catch (Exception e){
                               logger.error("scan command error ", e);
                               processService.moveToErrorCommand(command, 
e.toString());
                           }
                       } else{
                           //indicate that no command ,sleep for 1s
                           Thread.sleep(Constants.SLEEP_TIME_MILLIS);
                       }
                   }
               } catch (Exception e){
                   logger.error("master scheduler thread error",e);
               } finally{
                   zkMasterClient.releaseMutex(mutex);
               }
           }
       }
   
   ```
   
   when the db get a error or some other exception,the loop do not hava any 
function to down it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to