[ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724193#comment-13724193
 ] 

Ivan Mitic commented on HADOOP-9774:
------------------------------------

Thanks Shanyu for additional comments. I also spent some time looking at the 
problem and it seems to be more complex than I initially thought.

I'll try to respond by later in the evening today with my proposal for the fix. 
My current line of though is the following:
 - If we revert things to before HADOOP-8962 fix we have a problem where file 
names with colon are not interpreted the way we want them to be. Basically 
"a:b" will result in an URI with scheme "a". This behavior is by design and we 
shouldn't try to change it.
 - The question is now how to address the problem from HADOOP-8962. One 
approach is to encode URI special characters in the given file name before it 
is passed to Path. This "seems" like the right thing to do, but I’ll have to 
look into it a bit more. It is also not obvious what would encoding mean for 
the rest of Hadoop codebase (i.e. do we just do a targeted fix for the RLFS 
scenario?).


Another possibly relevant factor is that HDFS does not allow colon character in 
the file name. This means that supporting colon in RLFS does not help in 
scenarios where we copy HDFS context to RLFS and vice versa. 

In parallel feel free to share your thoughts.

                
> RawLocalFileSystem.listStatus() return absolute paths when input path is 
> relative on Windows
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9774
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: shanyu zhao
>         Attachments: HADOOP-9774-2.patch, HADOOP-9774-3.patch, 
> HADOOP-9774.patch
>
>
> On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
> relative path (without drive spec), e.g., "file:///mydata", the resulting 
> paths become absolute paths, e.g., ["file://E:/mydata/t1.txt", 
> "file://E:/mydata/t2.txt"...].
> Note that if we use it to enumerate an absolute path, e.g., 
> "file://E:/mydata" then the we get the same results as above.
> This breaks some hive unit tests which uses local file system to simulate 
> HDFS when testing, therefore the drive spec is removed. Then after 
> listStatus() the path is changed to absolute path, hive failed to find the 
> path in its map reduce job.
> You'll see the following exception:
> [junit] java.io.IOException: cannot find dir = 
> pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
> pathToPartitionInfo: 
> [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
> [junit]       at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
> This problem is introduced by this JIRA:
> HADOOP-8962
> Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
> relative paths if the parent paths are relative, e.g., 
> ["file:///mydata/t1.txt", "file:///mydata/t2.txt"...]
> This behavior change is a side effect of the fix in HADOOP-8962, not an 
> intended change. The resulting behavior, even though is legitimate from a 
> function point of view, break consistency from the caller's point of view. 
> When the caller use a relative path (without drive spec) to do listStatus() 
> the resulting path should be relative. Therefore, I think this should be 
> fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to