[ 
https://issues.apache.org/jira/browse/HDFS-17673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17909856#comment-17909856
 ] 

ASF GitHub Bot commented on HDFS-17673:
---------------------------------------

hadoop-yetus commented on PR #7270:
URL: https://github.com/apache/hadoop/pull/7270#issuecomment-2571566044

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   6m 41s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
   |||| _ branch-3.4.1 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m  1s |  |  branch-3.4.1 passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  branch-3.4.1 passed with JDK 
Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  branch-3.4.1 passed with JDK 
Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga  |
   | +1 :green_heart: |  checkstyle  |   0m 37s |  |  branch-3.4.1 passed  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  branch-3.4.1 passed  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  |  branch-3.4.1 passed with JDK 
Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  branch-3.4.1 passed with JDK 
Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga  |
   | +1 :green_heart: |  spotbugs  |   1m 39s |  |  branch-3.4.1 passed  |
   | +1 :green_heart: |  shadedclient  |  20m  0s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 31s |  |  the patch passed with JDK 
Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga  |
   | +1 :green_heart: |  javac  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 28s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7270/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 13 unchanged - 
0 fixed = 14 total (was 13)  |
   | +1 :green_heart: |  mvnsite  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga  |
   | +1 :green_heart: |  spotbugs  |   1m 37s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  19m 29s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 183m  2s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 30s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 272m 21s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.47 ServerAPI=1.47 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7270/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/7270 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 884fcbef640d 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 
20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.4.1 / 4715da6b9d450b64d2aefcd8870ac4baeab2b5d4 |
   | Default Java | Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_432-8u432-ga~us1-0ubuntu2~20.04-ga |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7270/1/testReport/ |
   | Max. process+thread count | 4489 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7270/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Unable to run HDFS balancer as a service
> ----------------------------------------
>
>                 Key: HDFS-17673
>                 URL: https://issues.apache.org/jira/browse/HDFS-17673
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer & mover
>    Affects Versions: 3.4.0
>            Reporter: Dirk Daems
>            Priority: Major
>              Labels: pull-request-available
>
> When running the HDFS balancer as a service using the following command
>  
> {code:java}
> hdfs balancer -asService {code}
> the first balancing round succeeds:
> {code:java}
> $ hdfs balancer -asService
> 2024-11-25 14:58:14,712 INFO balancer.Balancer: Balancer will run as a long 
> running service
> 2024-11-25 14:58:14,764 INFO impl.MetricsConfig: Loaded properties from 
> hadoop-metrics2.properties
> 2024-11-25 14:58:14,824 INFO impl.MetricsSystemImpl: Scheduled Metric 
> snapshot period at 10 second(s).
> 2024-11-25 14:58:14,824 INFO impl.MetricsSystemImpl: Balancer metrics system 
> started
> 2024-11-25 14:58:14,840 INFO balancer.Balancer: namenodes  = [hdfs://prd]
> 2024-11-25 14:58:14,841 INFO balancer.Balancer: parameters = 
> Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle 
> iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, 
> #blockpools = 0, run during upgrade = false, sort top nodes = false, hot 
> block time interval = 0]
> 2024-11-25 14:58:14,841 INFO balancer.Balancer: included nodes = []
> 2024-11-25 14:58:14,841 INFO balancer.Balancer: excluded nodes = []
> 2024-11-25 14:58:14,841 INFO balancer.Balancer: source nodes = []
> 2024-11-25 14:58:14,841 INFO balancer.Balancer: Keytab is configured, will 
> login using keytab.
> 2024-11-25 14:58:14,955 INFO security.UserGroupInformation: Login successful 
> for user h...@vgt.vito.be using keytab file hdfs.headless.keytab. Keytab auto 
> renewal enabled : false
> Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  
> Bytes Being Moved  NameNode
> 2024-11-25 14:58:14,956 INFO balancer.NameNodeConnector: getBlocks calls for 
> hdfs://prd will be rate-limited to 20 per second
> 2024-11-25 14:58:15,499 INFO balancer.KeyManager: Block token params received 
> from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 2024-11-25 14:58:15,501 INFO block.BlockTokenSecretManager: Block token key 
> range: [0, 2147483647)
> 2024-11-25 14:58:15,501 INFO block.BlockTokenSecretManager: Setting block 
> keys. BlockPool = BP-1957577775-192.168.146.236-1720512277934 .
> 2024-11-25 14:58:15,501 INFO balancer.KeyManager: Update block keys every 
> 2hrs, 30mins, 0sec
> 2024-11-25 14:58:15,702 INFO balancer.Balancer: 
> dfs.namenode.get-blocks.max-qps = 20 (default=20)
> 2024-11-25 14:58:15,703 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 
> 5400000 (default=5400000)
> 2024-11-25 14:58:15,703 INFO balancer.Balancer: dfs.balancer.moverThreads = 
> 1000 (default=1000)
> 2024-11-25 14:58:15,703 INFO balancer.Balancer: 
> dfs.balancer.dispatcherThreads = 200 (default=200)
> 2024-11-25 14:58:15,703 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 
> 2147483648 (default=2147483648)
> 2024-11-25 14:58:15,703 INFO balancer.Balancer: 
> dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
> 2024-11-25 14:58:15,703 INFO balancer.Balancer: 
> dfs.datanode.balance.max.concurrent.moves = 100 (default=100)
> 2024-11-25 14:58:15,703 INFO balancer.Balancer: 
> dfs.datanode.balance.bandwidthPerSec = 104857600 (default=104857600)
> 2024-11-25 14:58:15,705 INFO block.BlockTokenSecretManager: Setting block 
> keys. BlockPool = BP-1957577775-192.168.146.236-1720512277934 .
> 2024-11-25 14:58:15,708 INFO balancer.Balancer: dfs.balancer.max-size-to-move 
> = 10737418240 (default=10737418240)
> 2024-11-25 14:58:15,708 INFO balancer.Balancer: dfs.blocksize = 268435456 
> (default=134217728)
> 2024-11-25 14:58:15,732 INFO net.NetworkTopology: Adding a new node: 
> /LeibnizRack17/192.168.146.15:10004
> 2024-11-25 14:58:15,733 INFO net.NetworkTopology: Adding a new node: 
> /LeibnizRack17/192.168.146.16:10004
> 2024-11-25 14:58:15,733 INFO net.NetworkTopology: Adding a new node: 
> /LeibnizRack17/192.168.146.14:10004
> 2024-11-25 14:58:15,733 INFO net.NetworkTopology: Adding a new node: 
> /LeibnizRack17/192.168.146.13:10004
> 2024-11-25 14:58:15,733 INFO net.NetworkTopology: Adding a new node: 
> /LeibnizRack17/192.168.146.248:10004
> 2024-11-25 14:58:15,734 INFO balancer.Balancer: 0 over-utilized: []
> 2024-11-25 14:58:15,734 INFO balancer.Balancer: 0 underutilized: []
> Nov 25, 2024, 2:58:15 PM          0                  0 B                 0 B  
>               0 B                  0  hdfs://prd
> The cluster is balanced. Exiting...
> 2024-11-25 14:58:15,751 INFO balancer.Balancer: Balance succeed!
> 2024-11-25 14:58:15,751 INFO balancer.Balancer: Finished one round, will wait 
> for 5.0 minutes for next round{code}
> but subsequent rounds always fail with:
> {code:java}
> 2024-11-25 15:03:15,751 INFO balancer.Balancer: namenodes  = [hdfs://prd]
> 2024-11-25 15:03:15,751 INFO balancer.Balancer: parameters = 
> Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle 
> iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, 
> #blockpools = 0, run during upgrade = false, sort top nodes = false, hot 
> block time interval = 0]
> 2024-11-25 15:03:15,751 INFO balancer.Balancer: included nodes = []
> 2024-11-25 15:03:15,751 INFO balancer.Balancer: excluded nodes = []
> 2024-11-25 15:03:15,752 INFO balancer.Balancer: source nodes = []
> 2024-11-25 15:03:15,752 INFO balancer.Balancer: Keytab is configured, will 
> login using keytab.
> 2024-11-25 15:03:15,792 INFO security.UserGroupInformation: Login successful 
> for user h...@vgt.vito.be using keytab file hdfs.headless.keytab. Keytab auto 
> renewal enabled : false
> Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  
> Bytes Being Moved  NameNode
> 2024-11-25 15:03:15,792 INFO balancer.NameNodeConnector: getBlocks calls for 
> hdfs://prd will be rate-limited to 20 per second
> 2024-11-25 15:03:15,909 INFO balancer.KeyManager: Block token params received 
> from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 2024-11-25 15:03:15,909 INFO block.BlockTokenSecretManager: Block token key 
> range: [0, 2147483647)
> 2024-11-25 15:03:15,909 INFO block.BlockTokenSecretManager: Setting block 
> keys. BlockPool = BP-1957577775-192.168.146.236-1720512277934 .
> 2024-11-25 15:03:15,909 INFO balancer.KeyManager: Update block keys every 
> 2hrs, 30mins, 0sec
> 2024-11-25 15:03:15,934 INFO balancer.Balancer: 
> dfs.namenode.get-blocks.max-qps = 20 (default=20)
> 2024-11-25 15:03:15,934 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 
> 5400000 (default=5400000)
> 2024-11-25 15:03:15,934 INFO balancer.Balancer: dfs.balancer.moverThreads = 
> 1000 (default=1000)
> 2024-11-25 15:03:15,934 INFO balancer.Balancer: 
> dfs.balancer.dispatcherThreads = 200 (default=200)
> 2024-11-25 15:03:15,934 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 
> 2147483648 (default=2147483648)
> 2024-11-25 15:03:15,934 INFO balancer.Balancer: 
> dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
> 2024-11-25 15:03:15,935 INFO balancer.Balancer: 
> dfs.datanode.balance.max.concurrent.moves = 100 (default=100)
> 2024-11-25 15:03:15,935 INFO balancer.Balancer: 
> dfs.datanode.balance.bandwidthPerSec = 104857600 (default=104857600)
> 2024-11-25 15:03:15,936 INFO block.BlockTokenSecretManager: Setting block 
> keys. BlockPool = BP-1957577775-192.168.146.236-1720512277934 .
> 2024-11-25 15:03:15,936 INFO balancer.Balancer: dfs.balancer.max-size-to-move 
> = 10737418240 (default=10737418240)
> 2024-11-25 15:03:15,936 INFO balancer.Balancer: dfs.blocksize = 268435456 
> (default=134217728)
> 2024-11-25 15:03:15,945 WARN balancer.Balancer: Encounter exception while do 
> balance work. Already tried 1 times
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> Balancer-BP-1957577775-192.168.146.236-1720512277934 already exists!
>     at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>     at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>     at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>     at 
> org.apache.hadoop.hdfs.server.balancer.BalancerMetrics.create(BalancerMetrics.java:52)
>     at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.<init>(Balancer.java:362)
>     at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.doBalance(Balancer.java:824)
>     at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:887)
>     at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:975)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
>     at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1133)
> 2024-11-25 15:03:15,946 INFO balancer.Balancer: Finished one round, will wait 
> for 5.0 minutes for next round {code}
> and finally the process stops after retrying 5 times:
> {code:java}
> 2024-11-25 15:28:17,816 INFO balancer.Balancer: namenodes  = [hdfs://prd]
> 2024-11-25 15:28:17,817 INFO balancer.Balancer: parameters = 
> Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle 
> iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, 
> #blockpools = 0, run during upgrade = false, sort top nodes = false, hot 
> block time interval = 0]
> 2024-11-25 15:28:17,817 INFO balancer.Balancer: included nodes = []
> 2024-11-25 15:28:17,817 INFO balancer.Balancer: excluded nodes = []
> 2024-11-25 15:28:17,817 INFO balancer.Balancer: source nodes = []
> 2024-11-25 15:28:17,817 INFO balancer.Balancer: Keytab is configured, will 
> login using keytab.
> 2024-11-25 15:28:17,853 INFO security.UserGroupInformation: Login successful 
> for user h...@vgt.vito.be using keytab file hdfs.headless.keytab. Keytab auto 
> renewal enabled : false
> Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  
> Bytes Being Moved  NameNode
> 2024-11-25 15:28:17,853 INFO balancer.NameNodeConnector: getBlocks calls for 
> hdfs://prd will be rate-limited to 20 per second
> 2024-11-25 15:28:17,917 INFO balancer.KeyManager: Block token params received 
> from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 2024-11-25 15:28:17,917 INFO block.BlockTokenSecretManager: Block token key 
> range: [0, 2147483647)
> 2024-11-25 15:28:17,917 INFO block.BlockTokenSecretManager: Setting block 
> keys. BlockPool = BP-1957577775-192.168.146.236-1720512277934 .
> 2024-11-25 15:28:17,917 INFO balancer.KeyManager: Update block keys every 
> 2hrs, 30mins, 0sec
> 2024-11-25 15:28:17,938 INFO balancer.Balancer: 
> dfs.namenode.get-blocks.max-qps = 20 (default=20)
> 2024-11-25 15:28:17,938 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 
> 5400000 (default=5400000)
> 2024-11-25 15:28:17,938 INFO balancer.Balancer: dfs.balancer.moverThreads = 
> 1000 (default=1000)
> 2024-11-25 15:28:17,938 INFO balancer.Balancer: 
> dfs.balancer.dispatcherThreads = 200 (default=200)
> 2024-11-25 15:28:17,938 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 
> 2147483648 (default=2147483648)
> 2024-11-25 15:28:17,938 INFO balancer.Balancer: 
> dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
> 2024-11-25 15:28:17,938 INFO balancer.Balancer: 
> dfs.datanode.balance.max.concurrent.moves = 100 (default=100)
> 2024-11-25 15:28:17,938 INFO balancer.Balancer: 
> dfs.datanode.balance.bandwidthPerSec = 104857600 (default=104857600)
> 2024-11-25 15:28:17,940 INFO balancer.Balancer: dfs.balancer.max-size-to-move 
> = 10737418240 (default=10737418240)
> 2024-11-25 15:28:17,940 INFO balancer.Balancer: dfs.blocksize = 268435456 
> (default=134217728)
> 2024-11-25 15:28:17,940 INFO block.BlockTokenSecretManager: Setting block 
> keys. BlockPool = BP-1957577775-192.168.146.236-1720512277934 .
> Nov 25, 2024, 3:28:17 PM Balancing took 30.0581 minutes
> 2024-11-25 15:28:17,948 ERROR balancer.Balancer: Exiting balancer due an 
> exception
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> Balancer-BP-1957577775-192.168.146.236-1720512277934 already exists!
>     at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>     at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>     at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>     at 
> org.apache.hadoop.hdfs.server.balancer.BalancerMetrics.create(BalancerMetrics.java:52)
>     at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.<init>(Balancer.java:362)
>     at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.doBalance(Balancer.java:824)
>     at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:887)
>     at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:975)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
>     at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1133)
> {code}
> Looking at the HDFS balancer code, I don't see a way to prevent this error, 
> which is strange. A metrics source is created, with name "Balancer-" + 
> blockpoolID, where blockpoolID is 
> 'BP-1957577775-192.168.146.236-1720512277934' in our case.
> When in servicemode, new balancers (and thus metrics sources) will be created 
> in the doBalance method, while the service is running. 
> When looking at the metrics implementation, duplicates are only allowed when 
> the metrics system is being started. After correct initialization of the 
> metrics system, duplicates are no longer allowed (monitoring flag is set to 
> true).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to