[ 
https://issues.apache.org/jira/browse/HDFS-10904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HDFS-10904:
--------------------------------------
    Description: 
* A DiskBalancer {{NodePlan}} might include a Single {{MoveStep}} or a list of 
MoveSteps to perform the requested disk balancing operation.

* {{DiskBalancerWorkStatus}} tracks the current disk balancing operation status 
for the {{Plan}} just submitted. 

* {{DiskBalancerWorkStatus#Result}} has following states and the state machine 
movement for the {{currentResult}} state doesn't seem to be a driven totally 
from disk balancing operation. Especially, the state movement to DONE is 
happening only upon QueryResult, which can be improved. {code}
  /** Various result values. **/
  public enum Result {
    NO_PLAN(0),
    PLAN_UNDER_PROGRESS(1),
    PLAN_DONE(2),
    PLAN_CANCELLED(3);

DiskBalancer
cancelPlan(String)
        this.currentResult = Result.PLAN_CANCELLED;
DiskBalancer(String, Configuration, BlockMover)
    this.currentResult = Result.NO_PLAN;
queryWorkStatus()
        this.currentResult = Result.PLAN_DONE;
shutdown()
      this.currentResult = Result.NO_PLAN;
        this.currentResult = Result.PLAN_CANCELLED;
submitPlan(String, long, String, String, boolean)
      this.currentResult = Result.PLAN_UNDER_PROGRESS;
{code}


* More importantly, when the final {{MoveStep}} of the {{NodePlan}} fails, the 
currentResult state is stuck in {{PLAN_UNDER_PROGRESS}} forever. User querying 
the status will assume the operation is in progress when in reality its not 
making any progress.  User can also run {{Query}} command with _verbose_ option 
which then will display more details about the operation which includes details 
about errors encountered.
**  Query Output: {code}
Plan File:  <_file_path_>
Plan ID: <_plan_hash_>
Result: PLAN_UNDER_PROGRESS
{code}

** {code}
"sourcePath" : "/data/disk2/hdfs/dn",
  "destPath" : "/data/disk3/hdfs/dn",
  "workItem" :
    .. .. ..
    "errorCount" : 0,
    "errMsg" : null,
    .. .. 
    "maxDiskErrors" : 5,
    .. .. ..
{code}
** But, user has to decipher these details to make out that the disk balancing 
operation is stuck as the top level Result still says {{PLAN_UNDER_PROGRESS}}. 
So, we want the DiskBalancer differentiate between the in-progress operation 
and the stuck or final error operations.


  was:
* A DiskBalancer {{NodePlan}} might include a Single {{MoveStep}} or a list of 
MoveSteps to perform the requested disk balancing operation.

* {{DiskBalancerWorkStatus}} tracks the current disk balancing operation status 
for the {{Plan}} just submitted. 

* {{DiskBalancerWorkStatus#Result}} has following states and the state machine 
movement for the {{currentResult}} state doesn't seem to be a driven totally 
from disk balancing operation. Especially, the state movement to DONE is 
happening only upon QueryResult, which can be improved.

{code}
  /** Various result values. **/
  public enum Result {
    NO_PLAN(0),
    PLAN_UNDER_PROGRESS(1),
    PLAN_DONE(2),
    PLAN_CANCELLED(3);

DiskBalancer
cancelPlan(String)
        this.currentResult = Result.PLAN_CANCELLED;
DiskBalancer(String, Configuration, BlockMover)
    this.currentResult = Result.NO_PLAN;
queryWorkStatus()
        this.currentResult = Result.PLAN_DONE;
shutdown()
      this.currentResult = Result.NO_PLAN;
        this.currentResult = Result.PLAN_CANCELLED;
submitPlan(String, long, String, String, boolean)
      this.currentResult = Result.PLAN_UNDER_PROGRESS;
{code}


* More importantly, when the final {{MoveStep}} of the {{NodePlan}} fails, the 
currentResult state is stuck in {{PLAN_UNDER_PROGRESS}} forever. User querying 
the status will assume the operation is in progress when in reality its not 
making any progress.  User can also run {{Query}} command with _verbose_ option 
which then will display more details about the operation which includes details 
about errors encountered.
**  Query Output: {code}
Plan File:  <_file_path_>
Plan ID: <_plan_hash_>
Result: PLAN_UNDER_PROGRESS
{code}

** {code}
"sourcePath" : "/data/disk2/hdfs/dn",
  "destPath" : "/data/disk3/hdfs/dn",
  "workItem" :
    .. .. ..
    "errorCount" : 0,
    "errMsg" : null,
    .. .. 
    "maxDiskErrors" : 5,
    .. .. ..
{code}
** But, user has to decipher these details to make out that the disk balancing 
operation is stuck as the top level Result still says {{PLAN_UNDER_PROGRESS}}. 
So, we want the DiskBalancer differentiate between the in-progress operation 
and the stuck or final error operations.



> Need a new Result state for DiskBalancerWorkStatus to indicate the final Plan 
> step errors and stuck rebalancing
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10904
>                 URL: https://issues.apache.org/jira/browse/HDFS-10904
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: balancer & mover
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>             Fix For: 2.9.0
>
>
> * A DiskBalancer {{NodePlan}} might include a Single {{MoveStep}} or a list 
> of MoveSteps to perform the requested disk balancing operation.
> * {{DiskBalancerWorkStatus}} tracks the current disk balancing operation 
> status for the {{Plan}} just submitted. 
> * {{DiskBalancerWorkStatus#Result}} has following states and the state 
> machine movement for the {{currentResult}} state doesn't seem to be a driven 
> totally from disk balancing operation. Especially, the state movement to DONE 
> is happening only upon QueryResult, which can be improved. {code}
>   /** Various result values. **/
>   public enum Result {
>     NO_PLAN(0),
>     PLAN_UNDER_PROGRESS(1),
>     PLAN_DONE(2),
>     PLAN_CANCELLED(3);
> DiskBalancer
> cancelPlan(String)
>         this.currentResult = Result.PLAN_CANCELLED;
> DiskBalancer(String, Configuration, BlockMover)
>     this.currentResult = Result.NO_PLAN;
> queryWorkStatus()
>         this.currentResult = Result.PLAN_DONE;
> shutdown()
>       this.currentResult = Result.NO_PLAN;
>         this.currentResult = Result.PLAN_CANCELLED;
> submitPlan(String, long, String, String, boolean)
>       this.currentResult = Result.PLAN_UNDER_PROGRESS;
> {code}
> * More importantly, when the final {{MoveStep}} of the {{NodePlan}} fails, 
> the currentResult state is stuck in {{PLAN_UNDER_PROGRESS}} forever. User 
> querying the status will assume the operation is in progress when in reality 
> its not making any progress.  User can also run {{Query}} command with 
> _verbose_ option which then will display more details about the operation 
> which includes details about errors encountered.
> **  Query Output: {code}
> Plan File:  <_file_path_>
> Plan ID: <_plan_hash_>
> Result: PLAN_UNDER_PROGRESS
> {code}
> ** {code}
> "sourcePath" : "/data/disk2/hdfs/dn",
>   "destPath" : "/data/disk3/hdfs/dn",
>   "workItem" :
>     .. .. ..
>     "errorCount" : 0,
>     "errMsg" : null,
>     .. .. 
>     "maxDiskErrors" : 5,
>     .. .. ..
> {code}
> ** But, user has to decipher these details to make out that the disk 
> balancing operation is stuck as the top level Result still says 
> {{PLAN_UNDER_PROGRESS}}. So, we want the DiskBalancer differentiate between 
> the in-progress operation and the stuck or final error operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to