[ 
https://issues.apache.org/jira/browse/HADOOP-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733865#comment-14733865
 ] 

Daniel Templeton commented on HADOOP-12374:
-------------------------------------------

> User can go to reference link to understand what is checkpoint and what does 
> this command do

The provided link is to the HDFS architecture guide.  In that guide, if you 
search for "trash", it tells you that deleting a file will actually move it 
into the trash.  True, but not helpful.  If you search for "checkpoint", it 
tells you about the NN's edit log checkpointing.  Also true, but not helpful.  
The only thing I was able to find that clarified in specific terms what a trash 
checkpoint is, is the source code.  Googling around a bit, there are a couple 
of forum and blog posts here and there that do explain bits of how the trash 
works in HDFS, and taken together you get a pretty clear picture, but that's a 
handful of hits in a sea of "it empties the trash."

My point is that this is an excellent opportunity to create a useful source of 
documentation on what happens when you use -expunge.  I think these docs would 
be much more helpful if they said something like:

* The trash folder is divided into "checkpoints" that contain the files deleted 
during given time windows
* Every fs.trash.checkpoint.interval minutes, HDFS will create a new 
checkpoint, and all files subsequently deleted will go there
* Every fs.trash.interval minutes, HDFS will delete all checkpoints older than 
fs.trash.interval and then create a new checkpoint
* hdfs -expunge will causes HSDS to delete all checkpoints older than 
fs.trash.interval

I didn't think too hard about the phrasing, but you get my point.  Provide 
enough information that a user can understand what a checkpoint is and why 
they'd want to expunge one without having to go on a Googlequest or read source 
code.

> Description of hdfs expunge command is confusing
> ------------------------------------------------
>
>                 Key: HADOOP-12374
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12374
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: documentation, trash
>    Affects Versions: 2.7.0, 2.7.1
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>            Priority: Trivial
>              Labels: docuentation, newbie, suggestions, trash
>         Attachments: HADOOP-12374.001.patch
>
>
> Usage: hadoop fs -expunge
> Empty the Trash. Refer to the HDFS Architecture Guide for more information on 
> the Trash feature.
> this description is confusing. It gives user the impression that this command 
> will empty trash, but actually it only removes old checkpoints. If user sets 
> a pretty long value for fs.trash.interval, this command will not remove 
> anything until checkpoints exist longer than this value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to