[ 
https://issues.apache.org/jira/browse/HADOOP-16140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776937#comment-16776937
 ] 

Stephen O'Donnell commented on HADOOP-16140:
--------------------------------------------

Thanks all for looking into this.

The idea behind this jira is that any time I have seen a support case related 
to emptying the trash, the users think expunge should empty it immediately.

Expunge means "obliterate or remove completely", it does not do that, which is 
why its so confusing. So we can fix this in a few ways:

1. Make expunge actually empty the trash by default, which is what the command 
name suggests - I suspect we don't want to do this for compatibility reasons.

2. Add a flag to expunge (-immediate or -immediately), to override the current 
behaviour and clear the trash now. Having thought about this, I am coming 
around to this being the best idea.

3. Have a new emptyTrash command, which makes the purpose of expunge even more 
confusing.

Adam has suggested a dry-run option, and earlier in this thread Inigo suggest a 
confirmation message if you are emptying the trash now. I can see some merits 
on these, but even with the trash we see a remarkable number of cases where 
people accidentality delete data with -skipTrash. I fear we will see 
'accidental emptying of the trash' no matter what safety checks we add.

If the data gets into the trash, and the default expunge action is as before 
(ie retain trash for 24 hours by default), then if we ask for an "-immediate" 
flag to be past to delete it now, then we have already offered two lines of 
defence against accidental deletion. If we go that way, I think a confirmation 
message is unnecessary. I am not sure about the -dry-run option and how often 
it would be used over someone just listing the trash they are about to delete.

Steve also wants the ability to pass a filesystem as raised in HADOOP-13656 - I 
wonder if we should solve and commit this jira and then add in the filesystem 
switch afterwards (I am happy to work on it if we can get this one done).

I would also like the ability to pass the trash folder you wish to empty so you 
can empty your own trash in an EZ or a super user can clear any trash - that 
could be done here or in a follow up Jira too.

Can others chime on the best direction here? Ie:

1. Can we agree the best approach is adding "-immediate" to expunge and forget 
about the emptyTrash command?

2. Can we keep HADOOP-13656 separate and resolve it after this one?

3. We should allow a specific trash directory to be specified and do that in a 
separate Jira?

4. Should we add a dry-run option or not when -immediate is past?


> Add emptyTrash option to purge trash immediately
> ------------------------------------------------
>
>                 Key: HADOOP-16140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16140
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: HDFS-14200.001.patch
>
>
> I have always felt the HDFS trash is missing a simple way to empty the 
> current users trash immediately. We have "expunge" but in my experience 
> supporting clusters, end users find this confusing. When most end users run 
> expunge, they really want to empty their trash immediately and get confused 
> when expunge does not do this.
> This can result in users performing somewhat dangerous "skipTrash" operations 
> on the trash to free up space. The alternative, which most users will not 
> figure out on their own is:
> # Run the expunge command once - this will move the current folder to a 
> checkpoint and remove any old checkpoints older than the retention interval
> # Wait over 1 minute and then run expunge again, overriding fs.trash.interval 
> to 1 minute using the following command hadoop fs -Dfs.trash.interval=1 
> -expunge.
> With this Jira I am proposing to add a extra command, "hdfs dfs -emptyTrash" 
> that purges everything in the logged in users Trash directories immediately.
> How would the community feel about adding this new option? I will upload a 
> patch for comments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to