[jira] Commented: (HADOOP-3173) inconsistent globbing support for dfs commands

Chris Douglas (JIRA) Wed, 28 May 2008 20:17:10 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600672#action_12600672
 ]


Chris Douglas commented on HADOOP-3173:
---------------------------------------

I've been asked to clarify the implications of this proposal. There are 6 Path 
constructors:
# Path(String, String)
# Path(Path, String)
# Path(String, Path)
# Path(Path, Path)
# Path(String)
# Path(String, String, String)

Constructors 5 and 6 would preserve the path component of the URI as a String 
(the "rawPath") used only for globbing; all other Path operations would 
continue to work as they always have. For the following, let {{p}}, {{q}} be 
Paths, where {{q}} is initialized as:
{noformat}
Path q = new Path(p.toString());
{noformat}

Given the following initializations for {{p}}:
{noformat}
1. p = new Path("/foo/\\*/bar");
2. p = new Path("hdfs://foobar:8020/foo/p-1{\?}");
{noformat}

{{globStatus\(x)}} would return different results. In the first instance, 
globbing {{p}} would return the directory "\*", as _expected_ in this JIRA, 
while globbing {{q}} would have the result as _observed_ in this JIRA. In the 
second, p would be a legal glob (the escape prior to '?' wouldn't be converted 
to a path separator), so given:
{noformat}
[EMAIL PROTECTED] bin/hadoop dfs -ls 'foo/bar/'
Found 5 items:
1    0           2008-05-28 20:00  -rw-r--r--  chrisdo  supergroup  
/user/chrisdo/foo/bar/p-00
1    0           2008-05-28 20:01  -rw-r--r--  chrisdo  supergroup  
/user/chrisdo/foo/bar/p-01
1    0           2008-05-28 20:01  -rw-r--r--  chrisdo  supergroup  
/user/chrisdo/foo/bar/p-10
1    0           2008-05-28 20:01  -rw-r--r--  chrisdo  supergroup  
/user/chrisdo/foo/bar/p-11
1    0           2008-05-28 20:03  -rw-r--r--  chrisdo  supergroup  
/user/chrisdo/foo/bar/p-1?
{noformat}

One could specify both '{{foo/bar/p-1{\?}}}' (file 5) and '{{foo/bar/p-1?}}' 
(files 3-5).

There are two primary "globbers" in the codebase, FsShell and FileInputFormats. 
In the current proposal, the latter would continue to be in the "{{q}} case", 
i.e. there would be no change to its behavior. FsShell, however, would be in 
the "{{p}} case", i.e. the user string would be used for globbing without first 
passing through Path and URI normalization. This has the advantage of resolving 
this JIRA, but the significant disadvantage of making globbing in FsShell and 
map/reduce inconsistent. If a user were to test out a pattern in the shell and 
try to use it as a pattern for their FileInputFormat derivative, they could get 
different results.

> inconsistent globbing support for dfs commands
> ----------------------------------------------
>
>                 Key: HADOOP-3173
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3173
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>         Environment: Hadoop 0.16.1
>            Reporter: Rajiv Chittajallu
>             Fix For: 0.18.0
>
>         Attachments: 3173-0.patch
>
>
> hadoop dfs -mkdir /user/*/bar creates a directory "/user/*/bar" and you cant 
> deleted /user/* as -rmr expands the glob
> $ hadoop dfs -mkdir /user/rajive/a/*/foo
> $ hadoop dfs -ls /user/rajive/a
> Found 4 items
> /user/rajive/a/*      <dir>           2008-04-04 16:09        rwx------       
> rajive  users
> /user/rajive/a/b      <dir>           2008-04-04 16:08        rwx------       
> rajive  users
> /user/rajive/a/c      <dir>           2008-04-04 16:08        rwx------       
> rajive  users
> /user/rajive/a/d      <dir>           2008-04-04 16:08        rwx------       
> rajive  users
> $ hadoop dfs -ls /user/rajive/a/*
> /user/rajive/a/*/foo  <dir>           2008-04-04 16:09        rwx------       
> rajive  users
> $ hadoop dfs -rmr /user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/b
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/c
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/d
> I am not able to escape '*' from being expanded.
> $ hadoop dfs -rmr '/user/rajive/a/*'
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/b
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/c
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/d
> $ hadoop dfs -rmr  '/user/rajive/a/\*'
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/b
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/c
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/d
> $ hadoop dfs -rmr  /user/rajive/a/\* 
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/b
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/c
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/d

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3173) inconsistent globbing support for dfs commands

Reply via email to