[ 
https://issues.apache.org/jira/browse/CALCITE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741341#comment-17741341
 ] 

Runkang He commented on CALCITE-2495:
-------------------------------------

Sorry to discuss in this closed issue. I found there is a problem to parse the 
*http url* when supporting the Hive's {{PARSE_URL}} function.

I want to migrate Hive's implementation of this 
function([link|https://github.com/apache/hive/blob/5e46e80bc7d059093aece81e3886ba5ee425ee95/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFParseUrl.java#L81C8-L81C8])
 to keep consistency of results, but in Calcite, we forbid {{URL#getPath}} in 
signatures.txt, then and more, I can not find an equivalent method to keep the 
same behaviour with Hive. I have tried the suggested 
{{{}Sources.of(URL).path(){}}}, but the result is different from the Hive's, as 
below:

The http url to process: http://calcite.apache.org/path1/p.php?k1=v1&k2=v2#Ref1
The result of {{Sources.of(URL).path()}}: 
//calcite.apache.org/path1/p.php?k1=v1&k2=v2
The result of {{URL#getPath}}: /path1/p.php


> Support encoded URLs in calcite.util.Source, and use it for URL->File 
> conversion in tests
> -----------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2495
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2495
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Vladimir Sitnikov
>            Assignee: Julian Hyde
>            Priority: Major
>             Fix For: 1.18.0
>
>
> {{URL.getPath()}} produces %20 when path contains spaces.
> I suggest to rework all the uses of {{getResource()...}} to use 
> {{Sources.of(URL)}} so there's single -point of failure- way to convert URL 
> to File.
> This resolves Apache CI which happens to have a space in folder name.
> For the record:
> 1) {{URL.getPath()}} produces %20, so it is added to forbidden signatures
> 2) {{Paths.get(url.toURI()).toFile()}} almost works, however it fails with 
> URL is not hierarchical for {{new URL("file:test.java")}}
> 3) {{new File(URL.toURI()}} is worse than #2
> 4) {{URLDecoder}} must not be used to decode %20, since it will convert 
> {{\+}} to spaces as well, thus it will corrupt {{test.c++}}
> 5) It looks like {{url.toURI().getSchemeSpecificPart())}} properly handles 
> "opaque" URIs (which are relative {{file:test.java}} kind of URLs)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to