[ 
https://issues.apache.org/jira/browse/HADOOP-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499923#comment-16499923
 ] 

Zsolt Venczel commented on HADOOP-15217:
----------------------------------------

Thanks [~xiaochen] for sharing your thoughts in this matter.

>From a code readability and issue debugging perspective I completely agree 
>with you and I think the problem is twofold:
1) How readable the code is and how much help we provide for the developer 
while debugging the code.
2) How easy it is to triage a unit test failure which is quite important from a 
code maintainability perspective on the long run: if I see a unit test 
producing an error I can think of many things but if it explicitly fails I can 
assume more safely that some recent change introduced a regression.

>From the second perspective the response to your points:
??This is wrapping a code that's supposed to pass. What's the boundary of 
wrapping this line v.s. wrapping some other lines???
The limit should be as narrow to the tested code as possible. Wrapping more 
lines would not be meaningful from the actual tests perspective.
??Wrapping here in the test doesn't give more information. My assumption is a 
developer seeing the IOE from myUrl2.openStream(); would know this means that 
the openStream failed.??
I was trying to defined the test case to fail on openStream only if the fix is 
missing and if I managed to set it up correctly it fails only in case of a true 
regression. Nothing is perfect though, I agree.
??By Assert.fail with ex.getMessage(), we actually lose some information, which 
is the stacktrace of the IOE. Without the assert, the IOE will fail the test 
case, presenting us both its message and its stacktrace??
This is a really good point. I updated the patch to include the cause of the 
exception.
??Let's assume there is an IOE from this in the future. It feels to me it won't 
be too hard to figure out what's been tested here (since there are code 
comments, and one can also git blame).??
Currently in the hadoop code base quite a lot of unit tests are failing at each 
test suite run. If it's an error one might think it's flaky and treat it as so. 
If it's a failure that feels to be more intentional and one might be more 
careful and look for a regression.

Please let me know if this makes any sense but I can go happily by your 
suggestions as well.
Cheers,
Zsolt

> org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-15217
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15217
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.0.0
>            Reporter: Joseph Fourny
>            Assignee: Zsolt Venczel
>            Priority: Major
>         Attachments: HADOOP-15217.01.patch, HADOOP-15217.02.patch, 
> HADOOP-15217.03.patch, HADOOP-15217.04.patch, TestCase.java
>
>
> When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
> Spark is initialized), it breaks URLs with spaces (even though they are 
> properly URI-encoded). I traced the problem down to 
> _FSUrlConnection.connect()_ method. It naively gets the path from the URL, 
> which contains encoded spaces, and pases it to 
> _org.apache.hadoop.fs.Path(String)_ constructor. This is not correct, because 
> the docs clearly say that the string must NOT be encoded. Doing so causes 
> double encoding within the Path class (ie: %20 becomes %2520). 
> See attached JUnit test. 
> This test case mimics an issue I ran into when trying to use Commons 
> Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
> class to load configuration files, but Spark installs 
> _FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
> AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 
> The real fix is quite simple. All you need to do is replace this line in 
> _org.apache.hadoop.fs.FsUrlConnection.connect()_:
>         is = fs.open(new Path(url.getPath()));
> with this line:
>      is = fs.open(new Path(url.*toUri()*.getPath()));
> URI.getPath() will correctly decode the path, which is what is expected by 
> _org.apache.hadoop.fs.Path(String)_ constructor.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to