[
https://issues.apache.org/jira/browse/HDFS-10904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623069#comment-15623069
]
Manoj Govindassamy commented on HDFS-10904:
-------------------------------------------
[~anu],
I explored the code a little deeper and found the __Future__ object in
{{DiskBalancer}} and the way it is used in
{{DiskBalancerWorkStatus#queryWorkStatus}} is mitigating the need for an extra
result state to indicate explicit errors.
DiskBalancerWorkStatus#queryWorkStatus():
{code}
// if we had a plan in progress, check if it is finished.
if (this.currentResult == Result.PLAN_UNDER_PROGRESS &&
this.future != null &&
this.future.isDone()) {
this.currentResult = Result.PLAN_DONE;
}
{code}
So, even after the last MoveStep encountered any serious errors, since the
future moved to the Done state, the Result state is set to PLAN_DONE as against
my assumption of PLAN_UNDER_PROGRESS.
{noformat}
1266 2016-10-31 11:58:02,625 [main] INFO diskbalancer.TestDiskBalancer
(TestDiskBalancer.java:get(569)) - Work Status:
{"currentState":[{"sourcePath":"/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/","destPath":"/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/ha
doop-hdfs/target/test/data/dfs/data/data2/","workItem":{"startTime":0,"secondsElapsed":0,"bytesToCopy":51469,"bytesCopied":0,"errorCount":0,"errMsg":"Disk
Balancer - Unable to find dest volume:
/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2/","blocksCopied":0
,"maxDiskErrors":0,"tolerancePercent":10,"bandwidth":0}}],"result":"PLAN_DONE","planID":"147408705a52443d183b4415e318bc6283fe5fe6","planFile":"/system/current.plan.json"}
{noformat}
So, user looking at the detailed query result can get the details on what went
wrong from the Result errMsg. So, it is not very important to introduce a new
state. So, please feel free to move this bug out of the parent jira or closer
the bug as the new priority looks very low to me. Your thoughts please ?
> Need a new Result state for DiskBalancerWorkStatus to indicate the final Plan
> step errors and stuck rebalancing
> ---------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-10904
> URL: https://issues.apache.org/jira/browse/HDFS-10904
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: balancer & mover
> Affects Versions: 3.0.0-alpha2
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
> Fix For: 2.9.0
>
>
> * A DiskBalancer {{NodePlan}} might include a Single {{MoveStep}} or a list
> of MoveSteps to perform the requested disk balancing operation.
> * {{DiskBalancerWorkStatus}} tracks the current disk balancing operation
> status for the {{Plan}} just submitted.
> * {{DiskBalancerWorkStatus#Result}} has following states and the state
> machine movement for the {{currentResult}} state doesn't seem to be a driven
> totally from disk balancing operation. Especially, the state movement to DONE
> is happening only upon QueryResult, which can be improved. {code}
> /** Various result values. **/
> public enum Result {
> NO_PLAN(0),
> PLAN_UNDER_PROGRESS(1),
> PLAN_DONE(2),
> PLAN_CANCELLED(3);
> DiskBalancer
> cancelPlan(String)
> this.currentResult = Result.PLAN_CANCELLED;
> DiskBalancer(String, Configuration, BlockMover)
> this.currentResult = Result.NO_PLAN;
> queryWorkStatus()
> this.currentResult = Result.PLAN_DONE;
> shutdown()
> this.currentResult = Result.NO_PLAN;
> this.currentResult = Result.PLAN_CANCELLED;
> submitPlan(String, long, String, String, boolean)
> this.currentResult = Result.PLAN_UNDER_PROGRESS;
> {code}
> * More importantly, when the final {{MoveStep}} of the {{NodePlan}} fails,
> the currentResult state is stuck in {{PLAN_UNDER_PROGRESS}} forever. User
> querying the status will assume the operation is in progress when in reality
> its not making any progress. User can also run {{Query}} command with
> _verbose_ option which then will display more details about the operation
> which includes details about errors encountered.
> ** Query Output: {code}
> Plan File: <_file_path_>
> Plan ID: <_plan_hash_>
> Result: PLAN_UNDER_PROGRESS
> {code}
> ** {code}
> "sourcePath" : "/data/disk2/hdfs/dn",
> "destPath" : "/data/disk3/hdfs/dn",
> "workItem" :
> .. .. ..
> "errorCount" : 0,
> "errMsg" : null,
> .. ..
> "maxDiskErrors" : 5,
> .. .. ..
> {code}
> ** But, user has to decipher these details to make out that the disk
> balancing operation is stuck as the top level Result still says
> {{PLAN_UNDER_PROGRESS}}. So, we want the DiskBalancer differentiate between
> the in-progress operation and the stuck or final error operations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]