[ 
https://issues.apache.org/jira/browse/HADOOP-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467110#comment-13467110
 ] 

Harsh J commented on HADOOP-8845:
---------------------------------

bq. Since * can match the empty string, in other contexts it could be 
appropriate to return ""/tmp/testdir/testfile" for "/tmp/testdir/*/testfile".

Nice catch. I will add a test for this to see if we aren't handling it already.

bq. Ie is there a place where we know we should just be checking directory path 
elements? The comment in globStatusInternal ("// list parent directories and 
then glob the results") by one of the cases indicates is the intent but it's 
valid to pass both files and directories to listStatus.

The parts I've changed this under, try to fetch "parents", which can't mean 
anything but directories AFAICT.
                
> When looking for parent paths info, globStatus must filter out non-directory 
> elements to avoid an AccessControlException
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8845
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8845
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.0.0-alpha
>            Reporter: Harsh J
>            Assignee: Harsh J
>              Labels: glob
>         Attachments: HADOOP-8845.patch, HADOOP-8845.patch, HADOOP-8845.patch
>
>
> A brief description from my colleague Stephen Fritz who helped discover it:
> {code}
> [root@node1 ~]# su - hdfs
> -bash-4.1$ echo "My Test String">testfile <-- just a text file, for testing 
> below
> -bash-4.1$ hadoop dfs -mkdir /tmp/testdir <-- create a directory
> -bash-4.1$ hadoop dfs -mkdir /tmp/testdir/1 <-- create a subdirectory
> -bash-4.1$ hadoop dfs -put testfile /tmp/testdir/1/testfile <-- put the test 
> file in the subdirectory
> -bash-4.1$ hadoop dfs -put testfile /tmp/testdir/testfile <-- put the test 
> file in the directory
> -bash-4.1$ hadoop dfs -lsr /tmp/testdir
> drwxr-xr-x   - hdfs hadoop          0 2012-09-25 06:52 /tmp/testdir/1
> -rw-r--r--   3 hdfs hadoop         15 2012-09-25 06:52 /tmp/testdir/1/testfile
> -rw-r--r--   3 hdfs hadoop         15 2012-09-25 06:52 /tmp/testdir/testfile
> All files are where we expect them...OK, let's try reading
> -bash-4.1$ hadoop dfs -cat /tmp/testdir/testfile
> My Test String <-- success!
> -bash-4.1$ hadoop dfs -cat /tmp/testdir/1/testfile
> My Test String <-- success!
> -bash-4.1$ hadoop dfs -cat /tmp/testdir/*/testfile
> My Test String <-- success!  
> Note that we used an '*' in the cat command, and it correctly found the 
> subdirectory '/tmp/testdir/1', and ignore the regular file 
> '/tmp/testdir/testfile'
> -bash-4.1$ exit
> logout
> [root@node1 ~]# su - testuser <-- lets try it as a different user:
> [testuser@node1 ~]$ hadoop dfs -lsr /tmp/testdir
> drwxr-xr-x   - hdfs hadoop          0 2012-09-25 06:52 /tmp/testdir/1
> -rw-r--r--   3 hdfs hadoop         15 2012-09-25 06:52 /tmp/testdir/1/testfile
> -rw-r--r--   3 hdfs hadoop         15 2012-09-25 06:52 /tmp/testdir/testfile
> [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/testfile
> My Test String <-- good
> [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/1/testfile
> My Test String <-- so far so good
> [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/*/testfile
> cat: org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=testuser, access=EXECUTE, 
> inode="/tmp/testdir/testfile":hdfs:hadoop:-rw-r--r--
> {code}
> Essentially, we hit a ACE with access=EXECUTE on file /tmp/testdir/testfile 
> cause we tried to access the /tmp/testdir/testfile/testfile as a path. This 
> shouldn't happen, as the testfile is a file and not a path parent to be 
> looked up upon.
> {code}
> 2012-09-25 07:24:27,406 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 8020, call getFileInfo(/tmp/testdir/testfile/testfile)
> {code}
> Surprisingly the superuser avoids hitting into the error, as a result of 
> bypassing permissions, but that can be looked up on another JIRA - if it is 
> fine to let it be like that or not.
> This JIRA targets a client-sided fix to not cause such /path/file/dir or 
> /path/file/file kinda lookups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to