[
https://issues.apache.org/jira/browse/HADOOP-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065002#comment-13065002
]
Aaron T. Myers commented on HADOOP-7418:
----------------------------------------
Hey Andrew, I think the regex needs to changed. In particular, I don't think it
will actually cover the multiple back slash case since the double back slash in
your regex actually is just string-escaping one back slash, which is then
regex-escaping the "+" character. If you want to include a literal back slash
in the regex, you need to use 4 back slashes. (Silly, I know.)
Furthermore, I think that doing the replace in two stages (first forward
slashes, then back slashes) won't cover the case when forward slashes are
separated by back slashes (e.g. "/foo/\/bar".) To cover that case, you have two
options:
# Replace back slashes first, before forward slashes. The back slash
replacement could even be a 1-for-1 replacement, leaving you with a bunch of
consecutive forward slashes, which then get replaced by a single forward slash
in the next regex.
# Use something like this regex: "{{.replaceAll("(/|\\\\)+", "/")}}", which
replaces multiple consecutive "/" or "\" with a single "/".
It would also be worthwhile to add test cases to cover these cases.
> support for multiple slashes in the path separator
> --------------------------------------------------
>
> Key: HADOOP-7418
> URL: https://issues.apache.org/jira/browse/HADOOP-7418
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 0.23.0
> Environment: Linux running JDK 1.6
> Reporter: Sudharsan Sampath
> Assignee: Andrew Look
> Priority: Minor
> Labels: newbie
> Fix For: 0.23.0
>
> Attachments: HADOOP-7418.txt, HADOOP-7418.txt, HDFS-1460.txt,
> HDFS-1460.txt
>
>
> the parsing of the input path string to identify the uri authority conflicts
> with the file system paths. For instance the following is a valid path in
> both the linux file system and the hdfs.
> //user/directory1//directory2.
> While this works perfectly fine in the command line for manipulating hdfs,
> the same fails when specified as the input path for a mapper class with the
> following expcetion.
> Exception in thread "main" java.net.UnknownHostException: unknown host: user
> at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
> as the org.apache.hadoop.fs.Path class assumes the string that follows the
> '//' to be an uri authority
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira