[ 
https://issues.apache.org/jira/browse/CALCITE-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741602#comment-17741602
 ] 

Runkang He edited comment on CALCITE-5820 at 7/10/23 12:57 PM:
---------------------------------------------------------------

When implementing this function, I found there is a problem to parse the 
{*}http url{*}.

I want to migrate Hive's implementation of this 
function([link|https://github.com/apache/hive/blob/5e46e80bc7d059093aece81e3886ba5ee425ee95/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFParseUrl.java#L81C8-L81C8])
 to keep consistency of results, but in Calcite, we forbid {{URL#getPath}} in 
signatures.txt(the related issue is CALCITE-2495), then and more, I can not 
find an equivalent method to keep the same behaviour with Hive. I have tried 
the suggested {{{}Sources.of(URL).path(){}}}, but the result is different from 
the Hive's, as below:

The http url to process: 
[http://calcite.apache.org/path1/p.php?k1=v1&k2=v2#Ref1]
The result of {{{}Sources.of(URL).path(){}}}: 
//calcite.apache.org/path1/p.php?k1=v1&k2=v2
The result of {{{}URL#getPath{}}}: /path1/p.php


was (Author: JIRAUSER280488):
When implementing this function, I found there is a problem to parse the *http 
url* in current Calcite version.

I want to migrate Hive's implementation of this 
function([link|https://github.com/apache/hive/blob/5e46e80bc7d059093aece81e3886ba5ee425ee95/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFParseUrl.java#L81C8-L81C8])
 to keep consistency of results, but in Calcite, we forbid {{URL#getPath}} in 
signatures.txt(the related issue is CALCITE-2495), then and more, I can not 
find an equivalent method to keep the same behaviour with Hive. I have tried 
the suggested {{{}Sources.of(URL).path(){}}}, but the result is different from 
the Hive's, as below:

The http url to process: http://calcite.apache.org/path1/p.php?k1=v1&k2=v2#Ref1
The result of {{{}Sources.of(URL).path(){}}}: 
//calcite.apache.org/path1/p.php?k1=v1&k2=v2
The result of {{{}URL#getPath{}}}: /path1/p.php

> Add PARSE_URL function (enabled in Hive and Spark library)
> ----------------------------------------------------------
>
>                 Key: CALCITE-5820
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5820
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.34.0
>            Reporter: Runkang He
>            Assignee: Runkang He
>            Priority: Major
>              Labels: pull-request-available
>
> Add PARSE_URL (enabled in Hive and Spark library):
> PARSE_URL: Returns the specified part from the URL. Valid values for 
> partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and 
> USERINFO.
> For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 
> 'HOST') returns 'facebook.com'.
> See more details in 
> [Hive|https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#LanguageManualUDF-StringFunctions]
>  and 
> [Spark|https://spark.apache.org/docs/latest/api/sql/index.html#parse_url] doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to