[ https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969272#comment-13969272 ]
Akira AJISAKA commented on HADOOP-8989: --------------------------------------- Thanks [~jonallen] for updating the patch. bq. How about splitting this to small patches? +1 (non-binding) for splitting. If you don't have a time to do this, I'm willing to help. {code} /** * Construct a Print {@Expression} with an operational ASCII NULL * suffix. */ private Print(boolean appendNull) { {code} {{@Expression}} should be {{@link Expression}}. {code} public int apply(int current, int shift, int value) { return current | (value << shift); }; {code} In {{Perm.java}} some semicolons can be removed. > hadoop dfs -find feature > ------------------------ > > Key: HADOOP-8989 > URL: https://issues.apache.org/jira/browse/HADOOP-8989 > Project: Hadoop Common > Issue Type: New Feature > Reporter: Marco Nicosia > Assignee: Jonathan Allen > Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, > HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, > HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, > HADOOP-8989.patch > > > Both sysadmins and users make frequent use of the unix 'find' command, but > Hadoop has no correlate. Without this, users are writing scripts which make > heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs > -lsr is somewhat taxing on the NameNode, and a really slow experience on the > client side. Possibly an in-NameNode find operation would be only a bit more > taxing on the NameNode, but significantly faster from the client's point of > view? > The minimum set of options I can think of which would make a Hadoop find > command generally useful is (in priority order): > * -type (file or directory, for now) > * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments) > * -print0 (for piping to xargs -0) > * -depth > * -owner/-group (and -nouser/-nogroup) > * -name (allowing for shell pattern, or even regex?) > * -perm > * -size > One possible special case, but could possibly be really cool if it ran from > within the NameNode: > * -delete > The "hadoop dfs -lsr | hadoop dfs -rm" cycle is really, really slow. > Lower priority, some people do use operators, mostly to execute -or searches > such as: > * find / \(-nouser -or -nogroup\) > Finally, I thought I'd include a link to the [Posix spec for > find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html] -- This message was sent by Atlassian JIRA (v6.2#6252)