[ 
https://issues.apache.org/jira/browse/HDFS-16867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhiWei Shi updated HDFS-16867:
------------------------------
    Description: 
After the Mover process is started for a period of time, the process exits 
unexpectedly and an error is reported in the log

 
{code:java}
[hdfs@${hostname} hadoop-3.3.2-nn]$ nohup bin/hdfs mover -p 
/test-mover-jira9534 > mover.log.jira9534.20221209.2 &


[hdfs@{hostname}  hadoop-3.3.2-nn]$ tail -f mover.log.jira9534.20221209.2
...
22/12/09 14:22:32 INFO balancer.Dispatcher: Start moving blk_1073911285_170466 
with size=134217728 from 10.108.182.205:800:DISK to ${ip1}:800:ARCHIVE through 
${ip2}:800
22/12/09 14:22:32 INFO balancer.Dispatcher: Successfully moved 
blk_1073911285_170466 with size=134217728 from 10.108.182.205:800:DISK to 
${ip1}:800:ARCHIVE through ${ip2}:800
22/12/09 14:22:42 INFO impl.MetricsSystemImpl: Stopping Mover metrics system...
22/12/09 14:22:42 INFO impl.MetricsSystemImpl: Mover metrics system stopped.
22/12/09 14:22:42 INFO impl.MetricsSystemImpl: Mover metrics system shutdown 
complete.
Dec 9, 2022, 2:22:42 PM  Mover took 13mins, 19sec
22/12/09 14:22:42 ERROR mover.Mover: Exiting Mover due to an exception
org.apache.hadoop.metrics2.MetricsException: Metrics source 
Mover-${BlockpoolID} already exists!
        at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
        at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
        at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
        at 
org.apache.hadoop.hdfs.server.mover.MoverMetrics.create(MoverMetrics.java:49)
        at org.apache.hadoop.hdfs.server.mover.Mover.<init>(Mover.java:162)
        at org.apache.hadoop.hdfs.server.mover.Mover.run(Mover.java:684)
        at org.apache.hadoop.hdfs.server.mover.Mover$Cli.run(Mover.java:826)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
        at org.apache.hadoop.hdfs.server.mover.Mover.main(Mover.java:908) {code}
1、“final ExitStatus r = m.run()”return only after scheduled one of replica

2、“r == ExitStatus.IN_PROGRESS”,won’t run iter.remove()

3、Execute “new Mover” and “this.metrics = MoverMetrics.create(this)” multiple 
times for the same nnc,which leads to the error

 

 
{code:java}
//Mover.java
 for (final StorageType t : diff.existing) {
  for (final MLocation ml : locations) {
    final Source source = storages.getSource(ml);
    if (ml.storageType == t && source != null) {
      // try to schedule one replica move.
      if (scheduleMoveReplica(db, source, diff.expected)) { // 1、return only 
after scheduled one of replica             
         return true;
      }
    }
  }
}


while (connectors.size() > 0) {
  Collections.shuffle(connectors);
  Iterator<NameNodeConnector> iter = connectors.iterator();
  while (iter.hasNext()) {
    NameNodeConnector nnc = iter.next();
//3、Execute “new Mover” and “this.metrics = MoverMetrics.create(this)” multiple 
times for the same nnc,which leads to the error
     final Mover m = new Mover(nnc, conf, retryCount,   
         excludedPinnedBlocks);
    final ExitStatus r = m.run();

    if (r == ExitStatus.SUCCESS) { // 2、r ==ExitStatus.IN_PROGRESS,won’t run 
iter.remove()
       IOUtils.cleanupWithLogger(LOG, nnc);
      iter.remove();
    } {code}
 

Probably, we should initialize movermetrics when we initialize nnc

> Exiting Mover due to an exception in MoverMetrics.create
> --------------------------------------------------------
>
>                 Key: HDFS-16867
>                 URL: https://issues.apache.org/jira/browse/HDFS-16867
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZhiWei Shi
>            Priority: Major
>
> After the Mover process is started for a period of time, the process exits 
> unexpectedly and an error is reported in the log
>  
> {code:java}
> [hdfs@${hostname} hadoop-3.3.2-nn]$ nohup bin/hdfs mover -p 
> /test-mover-jira9534 > mover.log.jira9534.20221209.2 &
> [hdfs@{hostname}  hadoop-3.3.2-nn]$ tail -f mover.log.jira9534.20221209.2
> ...
> 22/12/09 14:22:32 INFO balancer.Dispatcher: Start moving 
> blk_1073911285_170466 with size=134217728 from 10.108.182.205:800:DISK to 
> ${ip1}:800:ARCHIVE through ${ip2}:800
> 22/12/09 14:22:32 INFO balancer.Dispatcher: Successfully moved 
> blk_1073911285_170466 with size=134217728 from 10.108.182.205:800:DISK to 
> ${ip1}:800:ARCHIVE through ${ip2}:800
> 22/12/09 14:22:42 INFO impl.MetricsSystemImpl: Stopping Mover metrics 
> system...
> 22/12/09 14:22:42 INFO impl.MetricsSystemImpl: Mover metrics system stopped.
> 22/12/09 14:22:42 INFO impl.MetricsSystemImpl: Mover metrics system shutdown 
> complete.
> Dec 9, 2022, 2:22:42 PM  Mover took 13mins, 19sec
> 22/12/09 14:22:42 ERROR mover.Mover: Exiting Mover due to an exception
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> Mover-${BlockpoolID} already exists!
>         at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>         at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>         at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>         at 
> org.apache.hadoop.hdfs.server.mover.MoverMetrics.create(MoverMetrics.java:49)
>         at org.apache.hadoop.hdfs.server.mover.Mover.<init>(Mover.java:162)
>         at org.apache.hadoop.hdfs.server.mover.Mover.run(Mover.java:684)
>         at org.apache.hadoop.hdfs.server.mover.Mover$Cli.run(Mover.java:826)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
>         at org.apache.hadoop.hdfs.server.mover.Mover.main(Mover.java:908) 
> {code}
> 1、“final ExitStatus r = m.run()”return only after scheduled one of replica
> 2、“r == ExitStatus.IN_PROGRESS”,won’t run iter.remove()
> 3、Execute “new Mover” and “this.metrics = MoverMetrics.create(this)” multiple 
> times for the same nnc,which leads to the error
>  
>  
> {code:java}
> //Mover.java
>  for (final StorageType t : diff.existing) {
>   for (final MLocation ml : locations) {
>     final Source source = storages.getSource(ml);
>     if (ml.storageType == t && source != null) {
>       // try to schedule one replica move.
>       if (scheduleMoveReplica(db, source, diff.expected)) { // 1、return only 
> after scheduled one of replica             
>          return true;
>       }
>     }
>   }
> }
> while (connectors.size() > 0) {
>   Collections.shuffle(connectors);
>   Iterator<NameNodeConnector> iter = connectors.iterator();
>   while (iter.hasNext()) {
>     NameNodeConnector nnc = iter.next();
> //3、Execute “new Mover” and “this.metrics = MoverMetrics.create(this)” 
> multiple times for the same nnc,which leads to the error
>      final Mover m = new Mover(nnc, conf, retryCount,   
>          excludedPinnedBlocks);
>     final ExitStatus r = m.run();
>     if (r == ExitStatus.SUCCESS) { // 2、r ==ExitStatus.IN_PROGRESS,won’t run 
> iter.remove()
>        IOUtils.cleanupWithLogger(LOG, nnc);
>       iter.remove();
>     } {code}
>  
> Probably, we should initialize movermetrics when we initialize nnc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to