[ https://issues.apache.org/jira/browse/HADOOP-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706699#action_12706699 ]
Philip Zeyliger commented on HADOOP-4412: ----------------------------------------- If you're willing to deal with XML, http://namenode/listPaths?recursive=yes will give you an XML file with all the files on that namenode. You could probably convert your find query into XQuery. > hadoop dfs -find feature > ------------------------ > > Key: HADOOP-4412 > URL: https://issues.apache.org/jira/browse/HADOOP-4412 > Project: Hadoop Core > Issue Type: New Feature > Components: dfs > Reporter: Marco Nicosia > > Both sysadmins and users make frequent use of the unix 'find' command, but > Hadoop has no correlate. Without this, users are writing scripts which make > heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs > -lsr is somewhat taxing on the NameNode, and a really slow experience on the > client side. Possibly an in-NameNode find operation would be only a bit more > taxing on the NameNode, but significantly faster from the client's point of > view? > The minimum set of options I can think of which would make a Hadoop find > command generally useful is (in priority order): > * -type (file or directory, for now) > * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments) > * -print0 (for piping to xargs -0) > * -depth > * -owner/-group (and -nouser/-nogroup) > * -name (allowing for shell pattern, or even regex?) > * -perm > * -size > One possible special case, but could possibly be really cool if it ran from > within the NameNode: > * -delete > The "hadoop dfs -lsr | hadoop dfs -rm" cycle is really, really slow. > Lower priority, some people do use operators, mostly to execute -or searches > such as: > * find / \(-nouser -or -nogroup\) > Finally, I thought I'd include a link to the [Posix spec for > find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.