[
https://issues.apache.org/jira/browse/HDFS-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526914#comment-15526914
]
Manoj Govindassamy commented on HDFS-9850:
------------------------------------------
With the proposed fix in this jira issue HDFS-9850, both
{{DiskBalancerMover#copyBlocks}} and {{DiskBalancer#queryWorkStatus}} handles
the case where any of the involved storage volumes in the plan are no more
available.
* {{copyBlocks}} logs the error about missing volume and returns without
crashing/exception
** hence that particular {{DiskBalancerWorkItem}} thus gets skipped and moves
on to the next one
** But, if the step happens to be the last one, then the currentResult gets
stuck at {{PLAN_UNDER_PROGRESS}} even if it returned with error
* {{queryWorkStatus}} also logs the error about missing volume and throws
{{DiskBalancerException}} INTERNAL_ERROR.
** {{QueryCommand}} which invokes queryWorkStatus gets the exception, and logs
the error query plan failed
** May be we should also print out something to the user so that he gets to see
some meaningful error message
As part of fixing this jira, I have also added a unit test case
{{TestDiskBalancer#testDiskBalancerWhenRemovingVolumes}} for the exact above
case. For now, the test code verifies that DiskBalancer is not crashing or
throwing weird exceptions.
To attack both above cases, jira HDFS-10904 has been filed so that user gets to
see proper error message and proper balancing operation state even when the
involved volumes are removed. And btw, this corner case was already there and
not introduced by this jira. This jira attempts to address unnecessary
FsVolumeSpi object references issue. Please let me know if you need more
clarifications.
> DiskBalancer : Explore removing references to FsVolumeSpi
> ----------------------------------------------------------
>
> Key: HDFS-9850
> URL: https://issues.apache.org/jira/browse/HDFS-9850
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: balancer & mover
> Affects Versions: 3.0.0-alpha2
> Reporter: Anu Engineer
> Assignee: Manoj Govindassamy
> Attachments: HDFS-9850.001.patch, HDFS-9850.002.patch,
> HDFS-9850.003.patch
>
>
> In HDFS-9671, [~arpitagarwal] commented that we should explore the
> possibility of removing references to FsVolumeSpi at any point and only deal
> with storage ID. We are not sure if this is possible, this JIRA is to explore
> if that can be done without issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]