[ 
https://issues.apache.org/jira/browse/HADOOP-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589154#action_12589154
 ] 

Doug Cutting commented on HADOOP-3257:
--------------------------------------

> Currently Path is limited by URI semantics in the sense that one cannot 
> create files whose names include characters such as ":" etc.

Path is a convenience class that wraps a URI.  URIs are the underlying 
mechanism Hadoop uses to name files.  Hadoop only supports a subset of possible 
URIs (hierarchical URIs, normalized to remove double-slashes and so that 
non-root paths don't end in a slash).  Path enforces this subset.  Path also 
handles some compatibility issues, mostly to make it easier to include Windows 
drive letters in "file:" URIs when running on Windows.

So a path is not limited by "URI semantics", it is implemented with URI syntax. 
 URIs permit escapes, so that one can include arbitrary unicode characters in a 
URI.  One *can* create URIs that include colons.  However our 
Windows-compatibility code may make it awkward to get colons through the Path 
wrapper into a URI and perhaps we can improve that.

> It would be nice if Path handled all characters correctly...

What does "correctly" mean?  I think we need more specific issues before we can 
have a real discussion.  Escaping here is tricky, since we have code that takes 
files from different filesystems that require different escapes and uses these 
to form paths.  I've commented on this previously:

https://issues.apache.org/jira/browse/HADOOP-2066?focusedCommentId=12558701#action_12558701

Two approaches are possible:
 - limit Paths to an interoperability subset, a common-denominator.  That's 
where we are today.
 - permit simpler and more automated escaping of certain characters.  That's a 
laudable goal.

I don't think we should simply say that Path must accept any string verbatim as 
a file name.  I think it is reasonable to permit syntax errors for clearly 
malformed paths.  It is also reasonable to permit colons in directory and file 
names.  If colons are unescaped in a relative path, then they can be confused 
for the URI scheme, and I think that interpretation trumps.

> Path should handle all characters
> ---------------------------------
>
>                 Key: HADOOP-3257
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3257
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.17.0
>            Reporter: Arun C Murthy
>
> Currently Path is limited by URI semantics in the sense that one cannot 
> create files whose names include characters such as ":" etc.
> HADOOP-2066 & HADOOP-3256 are manifestations of this problem. It would be 
> nice if Path handled all characters correctly...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to