[
https://issues.apache.org/jira/browse/HDFS-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450677#comment-15450677
]
Anu Engineer commented on HDFS-10808:
-------------------------------------
[~eddyxu] Thank you very much for taking time out to look at the code and ask
me these pertinent questions.
bq. Wouldn't break is the simplest way to exit the while loop?
if you had one or two breaks may be. But as the number of breaks increase, the
control flow graph becomes quite complex. So it becomes harder to reason about
the resources and other states from each exit point. An easy way of thinking
about copyBlocks would be to think of it as a simple state machine. if you were
implementing a state machine, you would probably move to state "exit" and use
the default mechanisms of the state machine to handle the exit state instead of
breaking out at each error condition. copyBlocks is following that pattern. So
while in generic case I agree with you, in this specific case I think this
pattern produces code that is easier to reason.
bq. on the comment.
{noformat}
// check if someone told us exit, treat this as an interruption
// point for the thread, since both getNextBlock and moveBlocAcrossVolume
// can take some time.
{noformat}
I was under the impression that comment is quite clear, may be I am mistaken.
There are several conditions under which we would like to exit in the copy
blocks thread. Some of them are states. Some are actions with clear side
effects. What we are trying to do is minimize the effects of both. So we
introduce the notion of "interruption points" in our copy thread. That is when
we invoke a function and if we encounter a failure condition, we flag that
information so that at the next safe point to bail, we will. In other words, we
don't exit at the point of error, but simply set the state so that thread can
proceed to a point where it considers that it is safe for it to exit.
Examples of action with side effects are, copy of data block but metadata is
not still copied or getting a bunch of disk errors (we wait till 5) before we
can get out etc, or finding a block and before we can get to it, it disappears
underneath us. Since we have all these kinds of external conditions to take
care of, we simply set up a flag telling the system to exit cleanly. This
paradigm gives us a centralized exit handler, so if the thread had to do some
specific cleanup based on certain error, it is still possible to chain those
error handlers at the exit point.
Yes, the Atomic nature of shouldRun flag is confusing and perhaps not needed.
It is an artifact of playing around with copying multiple blocks when I was
developing code. It had a different structure, but then I found that enforcing
bandwidth was harder and decided to do single block copy at a time.
I really appreciate you taking time out to ask these questions and helping to
make sure that I am in the right path.
> DiskBalancer does not execute multi-steps plan-redux
> ----------------------------------------------------
>
> Key: HDFS-10808
> URL: https://issues.apache.org/jira/browse/HDFS-10808
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: balancer & mover
> Reporter: Anu Engineer
> Assignee: Anu Engineer
> Attachments: HDFS-10808.001.patch, HDFS-10808.002.patch
>
>
> This is to redo of the fix in HDFS-10598
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]