[
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shanyu zhao updated HADOOP-9774:
--------------------------------
Attachment: HADOOP-9774-2.patch
Attached is a patch to fix the hadoop path resolution issues (HADOOP-8962 and
this jira) from its root.
The fundamental cause of issue HADOOP-8962 is that the constructor:
{code}
public Path(String parent, String child)
{code}
does not always work as it was intended to be in all scenarios. A simple
example is that the relative path represented by "child" could contain colon in
the file name, e.g. "a:b/t1.txt", which cause Path constructor to wrongfully
interpret the path.
One way to fix this problem is to add "./" to the beginning of the child string
if it's not an absolute path (starts with "/"). Also, on Windows, we need to
add a slash to the beginning if the child string starts with dive spec, e.g.,
"E:\data" -> "/E:\data".
I added a few Path related test cases, also a new test case to make sure on
Windows, the RawLocalFileSystem.listStatus() returns consistent paths.
> RawLocalFileSystem.listStatus() return absolute paths when input path is
> relative on Windows
> --------------------------------------------------------------------------------------------
>
> Key: HADOOP-9774
> URL: https://issues.apache.org/jira/browse/HADOOP-9774
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 3.0.0, 2.1.0-beta
> Reporter: shanyu zhao
> Attachments: HADOOP-9774-2.patch, HADOOP-9774.patch
>
>
> On Windows, when using RawLocalFileSystem.listStatus() to enumerate a
> relative path (without drive spec), e.g., "file:///mydata", the resulting
> paths become absolute paths, e.g., ["file://E:/mydata/t1.txt",
> "file://E:/mydata/t2.txt"...].
> Note that if we use it to enumerate an absolute path, e.g.,
> "file://E:/mydata" then the we get the same results as above.
> This breaks some hive unit tests which uses local file system to simulate
> HDFS when testing, therefore the drive spec is removed. Then after
> listStatus() the path is changed to absolute path, hive failed to find the
> path in its map reduce job.
> You'll see the following exception:
> [junit] java.io.IOException: cannot find dir =
> pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in
> pathToPartitionInfo:
> [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
> [junit] at
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
> This problem is introduced by this JIRA:
> HADOOP-8962
> Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are
> relative paths if the parent paths are relative, e.g.,
> ["file:///mydata/t1.txt", "file:///mydata/t2.txt"...]
> This behavior change is a side effect of the fix in HADOOP-8962, not an
> intended change. The resulting behavior, even though is legitimate from a
> function point of view, break consistency from the caller's point of view.
> When the caller use a relative path (without drive spec) to do listStatus()
> the resulting path should be relative. Therefore, I think this should be
> fixed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira