[
https://issues.apache.org/jira/browse/HADOOP-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589154#action_12589154
]
Doug Cutting commented on HADOOP-3257:
--------------------------------------
> Currently Path is limited by URI semantics in the sense that one cannot
> create files whose names include characters such as ":" etc.
Path is a convenience class that wraps a URI. URIs are the underlying
mechanism Hadoop uses to name files. Hadoop only supports a subset of possible
URIs (hierarchical URIs, normalized to remove double-slashes and so that
non-root paths don't end in a slash). Path enforces this subset. Path also
handles some compatibility issues, mostly to make it easier to include Windows
drive letters in "file:" URIs when running on Windows.
So a path is not limited by "URI semantics", it is implemented with URI syntax.
URIs permit escapes, so that one can include arbitrary unicode characters in a
URI. One *can* create URIs that include colons. However our
Windows-compatibility code may make it awkward to get colons through the Path
wrapper into a URI and perhaps we can improve that.
> It would be nice if Path handled all characters correctly...
What does "correctly" mean? I think we need more specific issues before we can
have a real discussion. Escaping here is tricky, since we have code that takes
files from different filesystems that require different escapes and uses these
to form paths. I've commented on this previously:
https://issues.apache.org/jira/browse/HADOOP-2066?focusedCommentId=12558701#action_12558701
Two approaches are possible:
- limit Paths to an interoperability subset, a common-denominator. That's
where we are today.
- permit simpler and more automated escaping of certain characters. That's a
laudable goal.
I don't think we should simply say that Path must accept any string verbatim as
a file name. I think it is reasonable to permit syntax errors for clearly
malformed paths. It is also reasonable to permit colons in directory and file
names. If colons are unescaped in a relative path, then they can be confused
for the URI scheme, and I think that interpretation trumps.
> Path should handle all characters
> ---------------------------------
>
> Key: HADOOP-3257
> URL: https://issues.apache.org/jira/browse/HADOOP-3257
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs
> Affects Versions: 0.17.0
> Reporter: Arun C Murthy
>
> Currently Path is limited by URI semantics in the sense that one cannot
> create files whose names include characters such as ":" etc.
> HADOOP-2066 & HADOOP-3256 are manifestations of this problem. It would be
> nice if Path handled all characters correctly...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.