Manoj Govindassamy created HDFS-10904: -----------------------------------------
Summary: Need a new Result state for DiskBalancerWorkStatus to indicate the final Plan step errors and stuck rebalancing Key: HDFS-10904 URL: https://issues.apache.org/jira/browse/HDFS-10904 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer & mover Affects Versions: 3.0.0-alpha2 Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy * A DiskBalancer {{NodePlan}} might include a Single {{MoveStep}} or a list of MoveSteps to perform the requested disk balancing operation. * {{DiskBalancerWorkStatus}} tracks the current disk balancing operation status for the {{Plan}} just submitted. * {{DiskBalancerWorkStatus#Result}} has following states and the state machine movement for the {{currentResult}} state doesn't seem to be a driven totally from disk balancing operation. Especially, the state movement to DONE is happening only upon QueryResult, which can be improved. {code} /** Various result values. **/ public enum Result { NO_PLAN(0), PLAN_UNDER_PROGRESS(1), PLAN_DONE(2), PLAN_CANCELLED(3); DiskBalancer cancelPlan(String) this.currentResult = Result.PLAN_CANCELLED; DiskBalancer(String, Configuration, BlockMover) this.currentResult = Result.NO_PLAN; queryWorkStatus() this.currentResult = Result.PLAN_DONE; shutdown() this.currentResult = Result.NO_PLAN; this.currentResult = Result.PLAN_CANCELLED; submitPlan(String, long, String, String, boolean) this.currentResult = Result.PLAN_UNDER_PROGRESS; {code} * More importantly, when the final {{MoveStep}} of the {{NodePlan}} fails, the currentResult state is stuck in {{PLAN_UNDER_PROGRESS}} forever. User querying the status will assume the operation is in progress when in reality its not making any progress. User can also run {{Query}} command with _verbose_ option which then will display more details about the operation which includes details about errors encountered. ** Query Output: {code} Plan File: <_file_path_> Plan ID: <_plan_hash_> Result: PLAN_UNDER_PROGRESS {code} ** {code} "sourcePath" : "/data/disk2/hdfs/dn", "destPath" : "/data/disk3/hdfs/dn", "workItem" : .. .. .. "errorCount" : 0, "errMsg" : null, .. .. "maxDiskErrors" : 5, .. .. .. {code} ** But, user has to decipher these details to make out that the disk balancing operation is stuck as the top level Result still says {{PLAN_UNDER_PROGRESS}}. So, we want the DiskBalancer differentiate between the in-progress operation and the stuck or final error operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org