[ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9774:
--------------------------------

    Attachment: HADOOP-9774-4.patch

Thanks Ivan for your suggestions. You are right, the Path(String parent, String 
child) constructor is not designed to accept ONLY relative path in child. There 
are some unit test cases where child is absolution path like "file:/mydata". My 
previous patch failed those cases.

So basically we cannot support file names with colon in child parameter in 
constructor Path(String parent, String child). This is because a string like 
"file:/" can be interpret either as a scheme or a file named "file:".

That being said, we can still use the trick Ivan mentioned - the Path(String 
scheme, String authority, String path) constructor. If we expect the path to 
contain colon, then we should use this explicit constructor to avoid ambiguity. 
So we can try adding "./" to the path string in this constructor to deal with 
colon character.

Attached is a new patch to implement this idea.
                
> RawLocalFileSystem.listStatus() return absolute paths when input path is 
> relative on Windows
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9774
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: shanyu zhao
>         Attachments: HADOOP-9774-2.patch, HADOOP-9774-3.patch, 
> HADOOP-9774-4.patch, HADOOP-9774.patch
>
>
> On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
> relative path (without drive spec), e.g., "file:///mydata", the resulting 
> paths become absolute paths, e.g., ["file://E:/mydata/t1.txt", 
> "file://E:/mydata/t2.txt"...].
> Note that if we use it to enumerate an absolute path, e.g., 
> "file://E:/mydata" then the we get the same results as above.
> This breaks some hive unit tests which uses local file system to simulate 
> HDFS when testing, therefore the drive spec is removed. Then after 
> listStatus() the path is changed to absolute path, hive failed to find the 
> path in its map reduce job.
> You'll see the following exception:
> [junit] java.io.IOException: cannot find dir = 
> pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
> pathToPartitionInfo: 
> [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
> [junit]       at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
> This problem is introduced by this JIRA:
> HADOOP-8962
> Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
> relative paths if the parent paths are relative, e.g., 
> ["file:///mydata/t1.txt", "file:///mydata/t2.txt"...]
> This behavior change is a side effect of the fix in HADOOP-8962, not an 
> intended change. The resulting behavior, even though is legitimate from a 
> function point of view, break consistency from the caller's point of view. 
> When the caller use a relative path (without drive spec) to do listStatus() 
> the resulting path should be relative. Therefore, I think this should be 
> fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to