zuston commented on PR #19675:
URL: https://github.com/apache/flink/pull/19675#issuecomment-1121805622

   @wangyang0918 Thanks for your quick reply.
   
   Yes, you are right.  If using the two levels queue naming policy, the leaf 
queue name(`-Dyarn.application.queue=default`) or full 
queue(`-Dyarn.application.queue=root.default`) name can work in 
capacity-scheduler when checking the queues.
   
   But when using three levels naming policy, like `root.job.streaming / 
root.job.batch`, it will be invalid. When i remote debug with our internal job, 
i found that the QueueInfo.getQueuePath will only return the simple queue name.
   
   So let's see the example in the following yarn queue configuration.
   
   > Yarn Queue:
   > 1. root.job.streaming
   > 2. root.job.batch
   > 
   > When retrieving all queues by using YarnClient.getAllQueues, it will 
return the list of QueueInfo. And the returning list of QueueInfo.getQueuePath 
is [root, job, streaming, batch]. 
   > 
   > 
   
   So when the flink job queue is specified as root.job.streaming and in the 
method of  `checkYarnQueues` , it will not find the corresponding the queue in 
the existing above Yarn queues and will print out the incorrect message.
   
   And why the queue can work in two level naming policy? I think it just add 
the extra special handler to avoid misusing the api of getQueuePath in the 
below code.
   ```
   for (QueueInfo queue : queues) {
                       if (queue.getQueueName().equals(this.yarnQueue)
                               // 特殊处理:queue.getQueueName().equals("root." + 
this.yarnQueue)
                               || queue.getQueueName().equals("root." + 
this.yarnQueue)) {
                           queueFound = true;
                           break;
                       }
                   }
   ```
   
   And returning to this issue, to solve above problem, we have to use the api 
of `getQueuePath`, but it's introduced in the Hadoop latest version.
   
   So i think more about why introducing this method of `checkYarnQueues`? Just 
to give user a tip and will not exit directly when the queue dont exist in the 
cluster yarn queues? If only so, i  think there is no need to use this method, 
the error message will be shown in the Flink Yarn application master diagnostic 
message.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to