[ 
https://issues.apache.org/jira/browse/CALCITE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595491#comment-16595491
 ] 

Vladimir Sitnikov commented on CALCITE-2495:
--------------------------------------------

I made a first pass of removal {{URL.getPath()}} from 
{{Source#append(Source)}}, and it required to rework 
{{org.apache.calcite.util.Source#path}}.

The method is strange. There's no javadoc, and it is not clear if the returned 
path should be encoded or decoded.

Old behavior was:
 1) return decoded filename if File part exists (that is file path without 
{{file:}} prefix)
 2) Otherwise return textual representation of URL (that is an URL with 
{{http:}} prefix)

How is that useful? I'm puzzled.

I think it makes sense to return the same kind of path, so I made {{#path()}} 
to return DECODED path component of the URL.
 For instance, for 
{{[http://en.wikipedia.org/wiki/List_of_United_States_cities_by_population]}} 
it would be 
{{//en.wikipedia.org/wiki/List_of_United_States_cities_by_population}} (URL 
without {{http:}} prefix)
 That is symmetric to the file case which returns file path without {{file:}} 
prefix.

It caused the following test to fail:
{code:java}
  @Test public void testFileReaderMalUrl() throws FileReaderException {
  @Test public void testFileReaderMalUrl() throws FileReaderException {
    try {
      final Source badSource = Sources.url("bad" + CITIES_SOURCE.path()); // 
<-- !!! here
      fail("expected exception, got " + badSource);
    } catch (RuntimeException e) {
      assertThat(e.getCause(), instanceOf(MalformedURLException.class));
      assertThat(e.getCause().getMessage(), is("unknown protocol: badhttp"));
    }
  }
{code}
I'm inclined to believe that the test is a misuse of the API, so I could just 
alter the test.
 Note: I do not see a reason why the test creates invalid URL via string 
concatenation. Why doesn't it use pre-defined string?
 The tests seems to exercise multiple actions at once, and best practices 
describe that the test should verify single behaviour only.

> Rework URL->File conversion in tests
> ------------------------------------
>
>                 Key: CALCITE-2495
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2495
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Vladimir Sitnikov
>            Assignee: Julian Hyde
>            Priority: Major
>
> {{URL.getPath()}} produces %20 when path contains spaces.
> I suggest to rework all the uses of {{getResource()...}} to use 
> {{Sources.of(URL)}} so there's single -point of failure- way to convert URL 
> to File.
> This resolves Apache CI which happens to have a space in folder name.
> For the record:
> 1) {{URL.getPath()}} produces %20, so it is added to forbidden signatures
> 2) {{Paths.get(url.toURI()).toFile()}} almost works, however it fails with 
> URL is not hierarchical for {{new URL("file:test.java")}}
> 3) {{new File(URL.toURI()}} is worse than #2
> 4) {{URLDecoder}} must not be used to decode %20, since it will convert {{+}} 
> to spaces as well, thus it will corrupt {{test.c++}}
> 5) It looks like {{url.toURI().getSchemeSpecificPart())}} properly handles 
> "opaque" URIs (which are relative {{file:test.java}} kind of URLs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to