[
https://issues.apache.org/jira/browse/HADOOP-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600672#action_12600672
]
Chris Douglas commented on HADOOP-3173:
---------------------------------------
I've been asked to clarify the implications of this proposal. There are 6 Path
constructors:
# Path(String, String)
# Path(Path, String)
# Path(String, Path)
# Path(Path, Path)
# Path(String)
# Path(String, String, String)
Constructors 5 and 6 would preserve the path component of the URI as a String
(the "rawPath") used only for globbing; all other Path operations would
continue to work as they always have. For the following, let {{p}}, {{q}} be
Paths, where {{q}} is initialized as:
{noformat}
Path q = new Path(p.toString());
{noformat}
Given the following initializations for {{p}}:
{noformat}
1. p = new Path("/foo/\\*/bar");
2. p = new Path("hdfs://foobar:8020/foo/p-1{\?}");
{noformat}
{{globStatus\(x)}} would return different results. In the first instance,
globbing {{p}} would return the directory "\*", as _expected_ in this JIRA,
while globbing {{q}} would have the result as _observed_ in this JIRA. In the
second, p would be a legal glob (the escape prior to '?' wouldn't be converted
to a path separator), so given:
{noformat}
[EMAIL PROTECTED] bin/hadoop dfs -ls 'foo/bar/'
Found 5 items:
1 0 2008-05-28 20:00 -rw-r--r-- chrisdo supergroup
/user/chrisdo/foo/bar/p-00
1 0 2008-05-28 20:01 -rw-r--r-- chrisdo supergroup
/user/chrisdo/foo/bar/p-01
1 0 2008-05-28 20:01 -rw-r--r-- chrisdo supergroup
/user/chrisdo/foo/bar/p-10
1 0 2008-05-28 20:01 -rw-r--r-- chrisdo supergroup
/user/chrisdo/foo/bar/p-11
1 0 2008-05-28 20:03 -rw-r--r-- chrisdo supergroup
/user/chrisdo/foo/bar/p-1?
{noformat}
One could specify both '{{foo/bar/p-1{\?}}}' (file 5) and '{{foo/bar/p-1?}}'
(files 3-5).
There are two primary "globbers" in the codebase, FsShell and FileInputFormats.
In the current proposal, the latter would continue to be in the "{{q}} case",
i.e. there would be no change to its behavior. FsShell, however, would be in
the "{{p}} case", i.e. the user string would be used for globbing without first
passing through Path and URI normalization. This has the advantage of resolving
this JIRA, but the significant disadvantage of making globbing in FsShell and
map/reduce inconsistent. If a user were to test out a pattern in the shell and
try to use it as a pattern for their FileInputFormat derivative, they could get
different results.
> inconsistent globbing support for dfs commands
> ----------------------------------------------
>
> Key: HADOOP-3173
> URL: https://issues.apache.org/jira/browse/HADOOP-3173
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Environment: Hadoop 0.16.1
> Reporter: Rajiv Chittajallu
> Fix For: 0.18.0
>
> Attachments: 3173-0.patch
>
>
> hadoop dfs -mkdir /user/*/bar creates a directory "/user/*/bar" and you cant
> deleted /user/* as -rmr expands the glob
> $ hadoop dfs -mkdir /user/rajive/a/*/foo
> $ hadoop dfs -ls /user/rajive/a
> Found 4 items
> /user/rajive/a/* <dir> 2008-04-04 16:09 rwx------
> rajive users
> /user/rajive/a/b <dir> 2008-04-04 16:08 rwx------
> rajive users
> /user/rajive/a/c <dir> 2008-04-04 16:08 rwx------
> rajive users
> /user/rajive/a/d <dir> 2008-04-04 16:08 rwx------
> rajive users
> $ hadoop dfs -ls /user/rajive/a/*
> /user/rajive/a/*/foo <dir> 2008-04-04 16:09 rwx------
> rajive users
> $ hadoop dfs -rmr /user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/b
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/c
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/d
> I am not able to escape '*' from being expanded.
> $ hadoop dfs -rmr '/user/rajive/a/*'
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/b
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/c
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/d
> $ hadoop dfs -rmr '/user/rajive/a/\*'
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/b
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/c
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/d
> $ hadoop dfs -rmr /user/rajive/a/\*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/b
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/c
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/d
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.