[
https://issues.apache.org/jira/browse/CALCITE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741341#comment-17741341
]
Runkang He commented on CALCITE-2495:
-------------------------------------
Sorry to discuss in this closed issue. I found there is a problem to parse the
*http url* when supporting the Hive's {{PARSE_URL}} function.
I want to migrate Hive's implementation of this
function([link|https://github.com/apache/hive/blob/5e46e80bc7d059093aece81e3886ba5ee425ee95/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFParseUrl.java#L81C8-L81C8])
to keep consistency of results, but in Calcite, we forbid {{URL#getPath}} in
signatures.txt, then and more, I can not find an equivalent method to keep the
same behaviour with Hive. I have tried the suggested
{{{}Sources.of(URL).path(){}}}, but the result is different from the Hive's, as
below:
The http url to process: http://calcite.apache.org/path1/p.php?k1=v1&k2=v2#Ref1
The result of {{Sources.of(URL).path()}}:
//calcite.apache.org/path1/p.php?k1=v1&k2=v2
The result of {{URL#getPath}}: /path1/p.php
> Support encoded URLs in calcite.util.Source, and use it for URL->File
> conversion in tests
> -----------------------------------------------------------------------------------------
>
> Key: CALCITE-2495
> URL: https://issues.apache.org/jira/browse/CALCITE-2495
> Project: Calcite
> Issue Type: Bug
> Reporter: Vladimir Sitnikov
> Assignee: Julian Hyde
> Priority: Major
> Fix For: 1.18.0
>
>
> {{URL.getPath()}} produces %20 when path contains spaces.
> I suggest to rework all the uses of {{getResource()...}} to use
> {{Sources.of(URL)}} so there's single -point of failure- way to convert URL
> to File.
> This resolves Apache CI which happens to have a space in folder name.
> For the record:
> 1) {{URL.getPath()}} produces %20, so it is added to forbidden signatures
> 2) {{Paths.get(url.toURI()).toFile()}} almost works, however it fails with
> URL is not hierarchical for {{new URL("file:test.java")}}
> 3) {{new File(URL.toURI()}} is worse than #2
> 4) {{URLDecoder}} must not be used to decode %20, since it will convert
> {{\+}} to spaces as well, thus it will corrupt {{test.c++}}
> 5) It looks like {{url.toURI().getSchemeSpecificPart())}} properly handles
> "opaque" URIs (which are relative {{file:test.java}} kind of URLs)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)