[jira] [Commented] (HDFS-227) hadoop dfs -find feature

Daryn Sharp (Commented) (JIRA) Thu, 22 Mar 2012 08:18:44 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235629#comment-13235629
 ]


Daryn Sharp commented on HDFS-227:
----------------------------------

I see a few problems with relying on an adjunct ruby script for finds.  For 
one, it requires ruby being installed on systems.  Second, it requires hadoop 
to certify which versions of ruby we support and/or have tested.  Maybe these 
aren't big issue but it's something to keep in mind.

I'm more worried about needing it to be kept in sync with hadoop releases.  
That could be solved by adopting the script into the core, but it's still 
another moving piece that needs/should be kept in sync with FsShell semantics 
and output.

The redesigned shell in 23 will make adding a find a rather easy task.  I've 
long been meaning to add find but haven't had the cycles.  In fact, yesterday I 
was really wishing I had find.
                
> hadoop dfs -find feature
> ------------------------
>
>                 Key: HDFS-227
>                 URL: https://issues.apache.org/jira/browse/HDFS-227
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Marco Nicosia
>
> Both sysadmins and users make frequent use of the unix 'find' command, but 
> Hadoop has no correlate. Without this, users are writing scripts which make 
> heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs 
> -lsr is somewhat taxing on the NameNode, and a really slow experience on the 
> client side. Possibly an in-NameNode find operation would be only a bit more 
> taxing on the NameNode, but significantly faster from the client's point of 
> view?
> The minimum set of options I can think of which would make a Hadoop find 
> command generally useful is (in priority order):
> * -type (file or directory, for now)
> * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
> * -print0 (for piping to xargs -0)
> * -depth
> * -owner/-group (and -nouser/-nogroup)
> * -name (allowing for shell pattern, or even regex?)
> * -perm
> * -size
> One possible special case, but could possibly be really cool if it ran from 
> within the NameNode:
> * -delete
> The "hadoop dfs -lsr | hadoop dfs -rm" cycle is really, really slow.
> Lower priority, some people do use operators, mostly to execute -or searches 
> such as:
> * find / \(-nouser -or -nogroup\)
> Finally, I thought I'd include a link to the [Posix spec for 
> find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-227) hadoop dfs -find feature

Reply via email to