[ 
https://issues.apache.org/jira/browse/HADOOP-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706594#action_12706594
 ] 

Allen Wittenauer commented on HADOOP-4412:
------------------------------------------

It would be good to get some traction on this old issue.  It is becoming more 
and more prevalent that ops teams need to be able to do operations on the whole 
of the file system (such as quota reporting, and find).  While tools such as 
the offline image viewer are nice, some tasks really do require relatively real 
time updates. There are also the issues about handing out the image file from a 
security and practicality perspective.

> hadoop dfs -find feature
> ------------------------
>
>                 Key: HADOOP-4412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4412
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Marco Nicosia
>
> Both sysadmins and users make frequent use of the unix 'find' command, but 
> Hadoop has no correlate. Without this, users are writing scripts which make 
> heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
> -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
> client side. Possibly an in-NameNode find operation would be only a bit more 
> taxing on the NameNode, but significantly faster from the client's point of 
> view?
> The minimum set of options I can think of which would make a Hadoop find 
> command generally useful is (in priority order):
> * -type (file or directory, for now)
> * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
> * -print0 (for piping to xargs -0)
> * -depth
> * -owner/-group (and -nouser/-nogroup)
> * -name (allowing for shell pattern, or even regex?)
> * -perm
> * -size
> One possible special case, but could possibly be really cool if it ran from 
> within the NameNode:
> * -delete
> The "hadoop dfs -lsr | hadoop dfs -rm" cycle is really, really slow.
> Lower priority, some people do use operators, mostly to execute -or searches 
> such as:
> * find / \(-nouser -or -nogroup\)
> Finally, I thought I'd include a link to the [Posix spec for 
> find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to