[
https://issues.apache.org/jira/browse/CALCITE-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741602#comment-17741602
]
Runkang He edited comment on CALCITE-5820 at 7/10/23 12:57 PM:
---------------------------------------------------------------
When implementing this function, I found there is a problem to parse the
{*}http url{*}.
I want to migrate Hive's implementation of this
function([link|https://github.com/apache/hive/blob/5e46e80bc7d059093aece81e3886ba5ee425ee95/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFParseUrl.java#L81C8-L81C8])
to keep consistency of results, but in Calcite, we forbid {{URL#getPath}} in
signatures.txt(the related issue is CALCITE-2495), then and more, I can not
find an equivalent method to keep the same behaviour with Hive. I have tried
the suggested {{{}Sources.of(URL).path(){}}}, but the result is different from
the Hive's, as below:
The http url to process:
[http://calcite.apache.org/path1/p.php?k1=v1&k2=v2#Ref1]
The result of {{{}Sources.of(URL).path(){}}}:
//calcite.apache.org/path1/p.php?k1=v1&k2=v2
The result of {{{}URL#getPath{}}}: /path1/p.php
was (Author: JIRAUSER280488):
When implementing this function, I found there is a problem to parse the *http
url* in current Calcite version.
I want to migrate Hive's implementation of this
function([link|https://github.com/apache/hive/blob/5e46e80bc7d059093aece81e3886ba5ee425ee95/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFParseUrl.java#L81C8-L81C8])
to keep consistency of results, but in Calcite, we forbid {{URL#getPath}} in
signatures.txt(the related issue is CALCITE-2495), then and more, I can not
find an equivalent method to keep the same behaviour with Hive. I have tried
the suggested {{{}Sources.of(URL).path(){}}}, but the result is different from
the Hive's, as below:
The http url to process: http://calcite.apache.org/path1/p.php?k1=v1&k2=v2#Ref1
The result of {{{}Sources.of(URL).path(){}}}:
//calcite.apache.org/path1/p.php?k1=v1&k2=v2
The result of {{{}URL#getPath{}}}: /path1/p.php
> Add PARSE_URL function (enabled in Hive and Spark library)
> ----------------------------------------------------------
>
> Key: CALCITE-5820
> URL: https://issues.apache.org/jira/browse/CALCITE-5820
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.34.0
> Reporter: Runkang He
> Assignee: Runkang He
> Priority: Major
> Labels: pull-request-available
>
> Add PARSE_URL (enabled in Hive and Spark library):
> PARSE_URL: Returns the specified part from the URL. Valid values for
> partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and
> USERINFO.
> For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1',
> 'HOST') returns 'facebook.com'.
> See more details in
> [Hive|https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#LanguageManualUDF-StringFunctions]
> and
> [Spark|https://spark.apache.org/docs/latest/api/sql/index.html#parse_url] doc.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)