Manoj Govindassamy created HDFS-10904:
-----------------------------------------
Summary: Need a new Result state for DiskBalancerWorkStatus to
indicate the final Plan step errors and stuck rebalancing
Key: HDFS-10904
URL: https://issues.apache.org/jira/browse/HDFS-10904
Project: Hadoop HDFS
Issue Type: Sub-task
Components: balancer & mover
Affects Versions: 3.0.0-alpha2
Reporter: Manoj Govindassamy
Assignee: Manoj Govindassamy
* A DiskBalancer {{NodePlan}} might include a Single {{MoveStep}} or a list of
MoveSteps to perform the requested disk balancing operation.
* {{DiskBalancerWorkStatus}} tracks the current disk balancing operation status
for the {{Plan}} just submitted.
* {{DiskBalancerWorkStatus#Result}} has following states and the state machine
movement for the {{currentResult}} state doesn't seem to be a driven totally
from disk balancing operation. Especially, the state movement to DONE is
happening only upon QueryResult, which can be improved.
{code}
/** Various result values. **/
public enum Result {
NO_PLAN(0),
PLAN_UNDER_PROGRESS(1),
PLAN_DONE(2),
PLAN_CANCELLED(3);
DiskBalancer
cancelPlan(String)
this.currentResult = Result.PLAN_CANCELLED;
DiskBalancer(String, Configuration, BlockMover)
this.currentResult = Result.NO_PLAN;
queryWorkStatus()
this.currentResult = Result.PLAN_DONE;
shutdown()
this.currentResult = Result.NO_PLAN;
this.currentResult = Result.PLAN_CANCELLED;
submitPlan(String, long, String, String, boolean)
this.currentResult = Result.PLAN_UNDER_PROGRESS;
{code}
* More importantly, when the final {{MoveStep}} of the {{NodePlan}} fails, the
currentResult state is stuck in {{PLAN_UNDER_PROGRESS}} forever. User querying
the status will assume the operation is in progress when in reality its not
making any progress. User can also run {{Query}} command with _verbose_ option
which then will display more details about the operation which includes details
about errors encountered.
** Query Output: {code}
Plan File: <_file_path_>
Plan ID: <_plan_hash_>
Result: PLAN_UNDER_PROGRESS
{code}
** {code}
"sourcePath" : "/data/disk2/hdfs/dn",
"destPath" : "/data/disk3/hdfs/dn",
"workItem" :
.. .. ..
"errorCount" : 0,
"errMsg" : null,
.. ..
"maxDiskErrors" : 5,
.. .. ..
{code}
** But, user has to decipher these details to make out that the disk balancing
operation is stuck as the top level Result still says {{PLAN_UNDER_PROGRESS}}.
So, we want the DiskBalancer differentiate between the in-progress operation
and the stuck or final error operations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]