[ https://issues.apache.org/jira/browse/HDFS-17421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anatoly updated HDFS-17421: --------------------------- Description: I wanted to calculate the load on the KDC in the hadoop cluster after enabling kerberos. There are two parameters in hdfs metrics {code:java} RpcAuthenticationSuccesses - Total number of authentication successes RpcAuthenticationFailures - Total number of authentication failures {code} I expect that any request to the cluster will generate a request to KDC -> get ticket and the request counter should trigger either +1 to one metric if successful or +1 to another metric if failed However, on the test cluster, where I have 4 data Nodes and 2 NameNodes (HA), I see completely different indicators for these metrics. I noticed that the RpcAuthenticationSuccesses readings are gradually increasing = +1 in 30 seconds For example, before a test in a cluster # only HDFS-\{NN,DN,JN,ZKFC} and YARN-\{RM,NM} services work # All other components that were enabled – hive, spark HistoryServer are disabled # There are no YARN jobs running and no user requests to hdfs At the time of the test, the value of the metrics RpcAuthenticationFailures = 0 RpcAuthenticationSuccesses = 208322 *TEST 1* To check the load, I run the test spark submit spark-examples_2.12-3.5.0.jar with num-executors 1 The request was executed for 1 min 20 sec RpcAuthenticationSuccesses = 208338 In total, 16 points were added during the execution time +2 can be attributed to those +1 min 30 sec. But what does +14 points mean? *TEST 2* RpcAuthenticationFailures = 0 RpcAuthenticationSuccesses = 208388 hdfs dfs -ls / RpcAuthenticationFailures = 0 RpcAuthenticationSuccesses = 208389 *TEST 3* Turned off - all DN Standby NN All YARN services I still have Three JN, ZKFC One NN Active The +1 counter continues to add +1 to the RpcAuthenticationSuccesses metric every 30 seconds Either I don't understand the meaning of these metrics correctly or something is not considered right was: I wanted to calculate the load on the KDC in the hadoop cluster after enabling kerberos. There are two parameters in hdfs metrics {code:java} RpcAuthenticationSuccesses - Total number of authentication successes RpcAuthenticationFailures - Total number of authentication failures {code} I expect that any request to the cluster will generate a request to KDC -> get ticket and the request counter should trigger either +1 to one metric if successful or +1 to another metric if failed However, on the test cluster, where I have 4 data Nodes and 2 NameNodes (HA), I see completely different indicators for these metrics. I noticed that the RpcAuthenticationSuccesses readings are gradually increasing = +1 in 30 seconds For example, before a test in a cluster # only HDFS-\{NN,DN,JN,ZKFC} and YARN-\{RM,NM} services work # All other components that were enabled – hive, spark HistoryServer are disabled # There are no YARN jobs running and no user requests to hdfs At the time of the test, the value of the metrics RpcAuthenticationFailures = 0 RpcAuthenticationSuccesses = 208322 *TEST 1* To check the load, I run the test spark submit spark-examples_2.12-3.5.0.jar with num-executors 1 The request was executed for 1 min 20 sec RpcAuthenticationSuccesses = 208338 In total, 16 points were added during the execution time +2 can be attributed to those +1 times in 30 seconds. But what does +14 points mean? *TEST 2* RpcAuthenticationFailures = 0 RpcAuthenticationSuccesses = 208388 hdfs dfs -ls / RpcAuthenticationFailures = 0 RpcAuthenticationSuccesses = 208389 *TEST 3* Turned off - all DN Standby NN All YARN services I still have Three JN, ZKFC One NN Active The +1 counter continues to add +1 to the RpcAuthenticationSuccesses metric every 30 seconds Either I don't understand the meaning of these metrics correctly or something is not considered right > Check the correctness of the calculation RpcAuthentication* > ----------------------------------------------------------- > > Key: HDFS-17421 > URL: https://issues.apache.org/jira/browse/HDFS-17421 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs, metrics > Reporter: Anatoly > Priority: Major > > I wanted to calculate the load on the KDC in the hadoop cluster after > enabling kerberos. > There are two parameters in hdfs metrics > > {code:java} > RpcAuthenticationSuccesses - Total number of authentication successes > RpcAuthenticationFailures - Total number of authentication failures > {code} > > I expect that any request to the cluster will generate a request to KDC -> > get ticket and the request counter should trigger either +1 to one metric if > successful or +1 to another metric if failed > However, on the test cluster, where I have 4 data Nodes and 2 NameNodes (HA), > I see completely different indicators for these metrics. > I noticed that the RpcAuthenticationSuccesses readings are gradually > increasing = +1 in 30 seconds > For example, before a test in a cluster > # only HDFS-\{NN,DN,JN,ZKFC} and YARN-\{RM,NM} services work > # All other components that were enabled – hive, spark HistoryServer are > disabled > # There are no YARN jobs running and no user requests to hdfs > At the time of the test, the value of the metrics > RpcAuthenticationFailures = 0 > RpcAuthenticationSuccesses = 208322 > > *TEST 1* > To check the load, I run the test spark submit spark-examples_2.12-3.5.0.jar > with num-executors 1 > The request was executed for 1 min 20 sec > RpcAuthenticationSuccesses = 208338 > In total, 16 points were added during the execution time > +2 can be attributed to those +1 min 30 sec. > But what does +14 points mean? > > *TEST 2* > RpcAuthenticationFailures = 0 > RpcAuthenticationSuccesses = 208388 > hdfs dfs -ls / > RpcAuthenticationFailures = 0 > RpcAuthenticationSuccesses = 208389 > > *TEST 3* > Turned off - > all DN > Standby NN > All YARN services > I still have > Three JN, ZKFC > One NN Active > The +1 counter continues to add +1 to the RpcAuthenticationSuccesses metric > every 30 seconds > > Either I don't understand the meaning of these metrics correctly or something > is not considered right -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org