[ https://issues.apache.org/jira/browse/HDFS-17421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chingachgook updated HDFS-17421: -------------------------------- Description: There is a question about two hdfs metrics that arose as a result of my attempts to calculate the load on the KDC for an industrial cluster There are two parameters in hdfs metrics RpcAuthenticationSuccesses - Total number of successful authentication attempts RpcAuthenticationFailures - Total number of authentication failures I expect that any data request in the hadoop cluster will commit the request to KDC -> get ticket, after which the request counter should activate either +1 to the metric if successful, or +1 to the metric if unsuccessful However, in a test cluster where I have 4 DataNodes and 2 NameNodes (HA), I see completely incomprehensible indicators for these metrics. By the way, at the same time, I noticed that the RpcAuthenticationSuccesses readings gradually increase by +1 every 30 seconds *TEST 1* I made sure that 1. Only HDFS-\{NN,DN,JN, ZKFC} and YARN-\{RM,NM} services work 2. All other components that were – hive, spark HistoryServer, are disabled 3. There are no YARN jobs running and no user requests to hdfs At the time of testing, the value of RpcAuthenticationFailures indicators = 0 RpcAuthenticationSuccesses = 208322 To check the download, I run the spark-submit test - spark-examples_2.12-3.5.0.jar with the number of performers = 1 The request was completed in 1 minute and 20 seconds RpcAuthenticationSuccesses = 208338 In total, +16 was added to the original value at runtime Let's say +2 can be attributed to the moment I wrote about above +1 every 30 seconds. But what does +14 authentications mean? *TEST 2* RpcAuthenticationFailures = 0 RpcAuthenticationSuccesses = 208388 hdfs dfs -ls / RpcAuthenticationFailures = 0 RpcAuthenticationSuccesses = 208389 Added +1. Why? I started kinit long before the ls/request, i.e. the metric should not have changed, I think so, but maybe I'm wrong *TEST 3* disabled - All DN are - Satndby NN - All YARN services (RM, NM) still running Three JN, ZKFC One NN is active The +1 counter continues to add +1 to the RpcAuthenticationSuccesses metric every 30 seconds Either I misunderstand the meaning of these indicators, or something is considered wrong was: I wanted to calculate the load on the KDC in the hadoop cluster after enabling kerberos. There are two parameters in hdfs metrics {code:java} RpcAuthenticationSuccesses - Total number of authentication successes RpcAuthenticationFailures - Total number of authentication failures {code} I expect that any request to the cluster will generate a request to KDC -> get ticket and the request counter should trigger either +1 to one metric if successful or +1 to another metric if failed However, on the test cluster, where I have 4 data Nodes and 2 NameNodes (HA), I see completely different indicators for these metrics. I noticed that the RpcAuthenticationSuccesses readings are gradually increasing = +1 in 30 seconds For example, before a test in a cluster # only HDFS-\{NN,DN,JN,ZKFC} and YARN-\{RM,NM} services work # All other components that were enabled – hive, spark HistoryServer are disabled # There are no YARN jobs running and no user requests to hdfs At the time of the test, the value of the metrics RpcAuthenticationFailures = 0 RpcAuthenticationSuccesses = 208322 *TEST 1* To check the load, I run the test spark submit spark-examples_2.12-3.5.0.jar with num-executors 1 The request was executed for 1 min 20 sec RpcAuthenticationSuccesses = 208338 In total, 16 points were added during the execution time +2 can be attributed to those +1 min 30 sec. But what does +14 points mean? *TEST 2* RpcAuthenticationFailures = 0 RpcAuthenticationSuccesses = 208388 hdfs dfs -ls / RpcAuthenticationFailures = 0 RpcAuthenticationSuccesses = 208389 *TEST 3* Turned off - all DN Standby NN All YARN services I still have Three JN, ZKFC One NN Active The +1 counter continues to add +1 to the RpcAuthenticationSuccesses metric every 30 seconds Either I don't understand the meaning of these metrics correctly or something is not considered right > Check the correctness of the calculation RpcAuthentication* > ----------------------------------------------------------- > > Key: HDFS-17421 > URL: https://issues.apache.org/jira/browse/HDFS-17421 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs, metrics > Reporter: Chingachgook > Priority: Major > > There is a question about two hdfs metrics that arose as a result of my > attempts to calculate the load on the KDC for an industrial cluster > There are two parameters in hdfs metrics > RpcAuthenticationSuccesses - Total number of successful authentication > attempts > RpcAuthenticationFailures - Total number of authentication failures > I expect that any data request in the hadoop cluster will commit > the request to KDC -> get ticket, > after which the request counter should activate either +1 to the metric if > successful, or +1 to the metric if unsuccessful > However, in a test cluster where I have > 4 DataNodes and 2 NameNodes (HA), I see completely incomprehensible > indicators for these metrics. > By the way, at the same time, I noticed that the RpcAuthenticationSuccesses > readings gradually increase by +1 every 30 seconds > > *TEST 1* > I made sure that > 1. Only HDFS-\{NN,DN,JN, ZKFC} and YARN-\{RM,NM} services work > 2. All other components that were – hive, spark HistoryServer, are disabled > 3. There are no YARN jobs running and no user requests to hdfs > At the time of testing, the value of RpcAuthenticationFailures indicators = 0 > RpcAuthenticationSuccesses = 208322 > To check the download, I run the spark-submit test - > spark-examples_2.12-3.5.0.jar with the number of performers = 1 > The request was completed in 1 minute and 20 seconds > RpcAuthenticationSuccesses = 208338 > In total, +16 was added to the original value at runtime > Let's say +2 can be attributed to the moment I wrote about above +1 every 30 > seconds. But what does +14 authentications mean? > *TEST 2* > RpcAuthenticationFailures = 0 > RpcAuthenticationSuccesses = 208388 > hdfs dfs -ls / > RpcAuthenticationFailures = 0 > RpcAuthenticationSuccesses = 208389 > Added +1. Why? > I started kinit long before the ls/request, i.e. the metric should not have > changed, I think so, but maybe I'm wrong > *TEST 3* > disabled > - All DN are > - Satndby NN > - All YARN services (RM, NM) > still running > Three JN, ZKFC > One NN is active > The +1 counter continues to add +1 to the RpcAuthenticationSuccesses metric > every 30 seconds > Either I misunderstand the meaning of these indicators, or something is > considered wrong -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org