[
https://issues.apache.org/jira/browse/HADOOP-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794088#action_12794088
]
Matt Ahrens commented on HADOOP-6345:
-------------------------------------
Regarding the two general concerns, here are my comments:
"It seems like the essential, convoluted logic of the moveToTrash method
remains. I had thought that this would be a good chance to clean up some of the
nested try-catch blocks. And the existing for loops seems particularly
confusing."
The convoluted logic exists because of the potential race condition that can
occur between the mkdir and the rename. If a checkpoint() call gets executed
between the mkdir and rename call, it will have moved the newly made dir and
the rename will fail. So the author must have handled it by running the
commands twice, figuring the odds of two checkpoints happening that closely
together is very small. One possible alternative (to eliminate the race
condition) would be to move the file to a temporary workspace before moving it
to the trash to avoid conflicts with the checkpoint. For example:
CURRENT BEHAVIOR: file foo/bar gets moved to trash directly
a) File foo/bar is designated to be moved to trash
b) moveToTrash() creates .Trash/Current/foo
c) moveToTrash() renames foo/bar to .Trash/Current/foo
d) if rename fails (because of checkpoint() race condition), repeat (b) and (c)
once
e) result is .Trash/Current/foo/bar
PROPOSED BEHAVIOR (to eliminate race condition):
a) File foo/bar is designated to be moved to trash
b) moveToTrash() creates .Trash/.tmpspace/foo
c) moveToTrash() renames foo/bar to .Trash/.tmpspace/foo/bar
d) moveToTrash() renames .Trash/.tmpspace/foo/bar to ./Trash/Current/foo/bar
e) if (d) rename fails because of dir name conflict, rename
.Trash/.tmpspace/foo/bar to .Trash/.tmpspace/foo-<timestamp>/bar and repeat (d)
f) e) result is .Trash/Current/foo/bar (or .Trash/Current/foo-<timestamp>/bar
if name conflict on top-level dir)
So the proposed behavior definitely would be possible, but it would be a change
to a function that has been stable for while (even though is has convoluted
logic). Any thoughts???
"Of slight concern to me is the change that prevents the user from deleting a
file that's in the trash already from the trash. I guess now that we have
-skipTrash there is a method for a use to selectively purge items from the
trash, but this will still be an incompatible change."
Yes, I understand that the incompatible change may not be necessary. I am not
again leaving it such that a user can delete a file from the trash to leave
compatibility there. So it definitely is possible to leave Path 3 (as
referenced above) to just return false and not throw an exception as my patch
does.
> Refactor Trash::moveToTrash() and its accompanying poor tests
> -------------------------------------------------------------
>
> Key: HADOOP-6345
> URL: https://issues.apache.org/jira/browse/HADOOP-6345
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs
> Affects Versions: 0.20.1
> Reporter: Jakob Homan
> Assignee: Matt Ahrens
> Attachments: HADOOP-6345.2.patch, HADOOP-6345.patch
>
>
> We've had several issues relating to the Trash and its access via the shell,
> none of which were picked up by the unit tests. The current moveToTrash
> method has 8 different ways it can terminate and sometimes uses a false value
> and sometimes uses an exception to indicate failure. This method should be
> refactored to improve readability and testability, and new tests written to
> exercise all possible code paths.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.