[jira] [Commented] (HADOOP-15217) org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces

Xiao Chen (JIRA) Sun, 03 Jun 2018 22:12:18 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499746#comment-16499746
 ]


Xiao Chen commented on HADOOP-15217:
------------------------------------

Thanks Zsolt for revving and the discussion about testing.

In general I agree test code should be as explicit as possible, to make it 
easier to debug when something fails. Looking at the specific code, however
{code:java}
try {
  myUrl2.openStream();
} catch (IOException ex) {
  Assert.fail("Unable to open stream to file: " + ex.getMessage());
}{code}
IMO this is unnecessary and may make test failures harder to debug. :)
 * This is wrapping a code that's supposed to pass. What's the boundary of 
wrapping this line v.s. wrapping some other lines?
 * Wrapping here in the test doesn't give more information. My assumption is a 
developer seeing the IOE from {{myUrl2.openStream();}} would know this means 
that the openStream failed. :)
 * By {{Assert.fail}} with ex.getMessage(), we actually lose some information, 
which is the stacktrace of the IOE. Without the assert, the IOE will fail the 
test case, presenting us both its message and its stacktrace
 * Let's assume there is an IOE from this in the future. It feels to me it 
won't be too hard to figure out what's been tested here (since there are code 
comments, and one can also git blame).

If this were for a failure code, it would be completely reasonable to do 
something like:
{code:java}
try {
  myUrl2.openStream();
  Assert.fail("Open stream should fail because xxx");
} catch (IOException expected) {
  // check 'expected' is exactly the test wanted. Maybe by 
GenericTestUtils.assertExceptionContains or equivalent
}{code}

> org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-15217
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15217
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.0.0
>            Reporter: Joseph Fourny
>            Assignee: Zsolt Venczel
>            Priority: Major
>         Attachments: HADOOP-15217.01.patch, HADOOP-15217.02.patch, 
> HADOOP-15217.03.patch, TestCase.java
>
>
> When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
> Spark is initialized), it breaks URLs with spaces (even though they are 
> properly URI-encoded). I traced the problem down to 
> _FSUrlConnection.connect()_ method. It naively gets the path from the URL, 
> which contains encoded spaces, and pases it to 
> _org.apache.hadoop.fs.Path(String)_ constructor. This is not correct, because 
> the docs clearly say that the string must NOT be encoded. Doing so causes 
> double encoding within the Path class (ie: %20 becomes %2520). 
> See attached JUnit test. 
> This test case mimics an issue I ran into when trying to use Commons 
> Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
> class to load configuration files, but Spark installs 
> _FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
> AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 
> The real fix is quite simple. All you need to do is replace this line in 
> _org.apache.hadoop.fs.FsUrlConnection.connect()_:
>         is = fs.open(new Path(url.getPath()));
> with this line:
>      is = fs.open(new Path(url.*toUri()*.getPath()));
> URI.getPath() will correctly decode the path, which is what is expected by 
> _org.apache.hadoop.fs.Path(String)_ constructor.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15217) org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces

Reply via email to