[jira] [Commented] (HDFS-13892) Disk Balancer : Invalid exit code for disk balancer execute command

Anu Engineer (JIRA) Tue, 04 Sep 2018 09:02:38 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603237#comment-16603237
 ]


Anu Engineer commented on HDFS-13892:
-------------------------------------

Hi  [~Harsha1206], Thank you for taking time to file this JIRA. I appreciate 
it. From a high level, I do agree with your thought process. In an ideal world, 
the execute should have returned an error code. However, disk balancer works 
slightly differently, understanding what happens under the covers might help 
you realize what is happening, though you may not agree with the behavior.

When DiskBalancer executes the plan, it is the beginning of an asynchronous 
process that can take a long time. So execute command directly hands over the 
disk balancer plan file to the data node and data node executes the plan.

Hence a return code from execute indicates merely that the command was 
successful in handing over the plan file to data node to perform balancing at a 
later point in time.

DiskBalancer also supports a command called query, that will give you 
information about the current status, a failure during runtime will be 
reflected in the query command, and as you rightly discovered in the data node 
logs.

if you believe that Apache Hadoop user documentation does adequately explain 
this process, please feel free to convert this Jira to a doc Jira, and one of 
us can get this fixed.

if you would like to pursue the architecture of Disk balancer, there are two 
documents – the proposal and the architecture and test plan in this Jira 
https://issues.apache.org/jira/browse/HDFS-1312 which explains how disk 
balancer works in great detail.


Thank you very much for taking time out to file the Jira and bringing this 
issue to my attention.

 

 

 

 

> Disk Balancer : Invalid exit code for disk balancer execute command
> -------------------------------------------------------------------
>
>                 Key: HDFS-13892
>                 URL: https://issues.apache.org/jira/browse/HDFS-13892
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: diskbalancer
>            Reporter: Harshakiran Reddy
>            Priority: Major
>
> {{scenario:-}}
> 1. Write some 5GB data with one DISK
>  2. Add one more non-empty Disk to above Datanode 
>  3.Run the plan command for the above specific datanode 
>  4. run the Execute command with the above plan file
>  the above execute command not happened as per the datanode log
> {noformat}
> ERROR org.apache.hadoop.hdfs.server.datanode.DiskBalancer: Destination 
> volume: file:/Test_Disk/DISK2/ does not have enough space to accommodate a 
> block. Block Size: 268435456 Exiting from copyBlocks.
> {noformat}
> 5. see the exit code for execute command, it display the 0
> {{Expected Result :-}}
> 1. Exit code should be 1 why means execution was not happened 
>  2. In this type of scenario In console print the that error message that 
> time customer/user knows execute was not happened.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13892) Disk Balancer : Invalid exit code for disk balancer execute command

Reply via email to