[
https://issues.apache.org/jira/browse/HADOOP-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733865#comment-14733865
]
Daniel Templeton commented on HADOOP-12374:
-------------------------------------------
> User can go to reference link to understand what is checkpoint and what does
> this command do
The provided link is to the HDFS architecture guide. In that guide, if you
search for "trash", it tells you that deleting a file will actually move it
into the trash. True, but not helpful. If you search for "checkpoint", it
tells you about the NN's edit log checkpointing. Also true, but not helpful.
The only thing I was able to find that clarified in specific terms what a trash
checkpoint is, is the source code. Googling around a bit, there are a couple
of forum and blog posts here and there that do explain bits of how the trash
works in HDFS, and taken together you get a pretty clear picture, but that's a
handful of hits in a sea of "it empties the trash."
My point is that this is an excellent opportunity to create a useful source of
documentation on what happens when you use -expunge. I think these docs would
be much more helpful if they said something like:
* The trash folder is divided into "checkpoints" that contain the files deleted
during given time windows
* Every fs.trash.checkpoint.interval minutes, HDFS will create a new
checkpoint, and all files subsequently deleted will go there
* Every fs.trash.interval minutes, HDFS will delete all checkpoints older than
fs.trash.interval and then create a new checkpoint
* hdfs -expunge will causes HSDS to delete all checkpoints older than
fs.trash.interval
I didn't think too hard about the phrasing, but you get my point. Provide
enough information that a user can understand what a checkpoint is and why
they'd want to expunge one without having to go on a Googlequest or read source
code.
> Description of hdfs expunge command is confusing
> ------------------------------------------------
>
> Key: HADOOP-12374
> URL: https://issues.apache.org/jira/browse/HADOOP-12374
> Project: Hadoop Common
> Issue Type: Bug
> Components: documentation, trash
> Affects Versions: 2.7.0, 2.7.1
> Reporter: Weiwei Yang
> Assignee: Weiwei Yang
> Priority: Trivial
> Labels: docuentation, newbie, suggestions, trash
> Attachments: HADOOP-12374.001.patch
>
>
> Usage: hadoop fs -expunge
> Empty the Trash. Refer to the HDFS Architecture Guide for more information on
> the Trash feature.
> this description is confusing. It gives user the impression that this command
> will empty trash, but actually it only removes old checkpoints. If user sets
> a pretty long value for fs.trash.interval, this command will not remove
> anything until checkpoints exist longer than this value.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)