[ 
https://issues.apache.org/jira/browse/SPARK-44500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746705#comment-17746705
 ] 

Pablo Langa Blanco commented on SPARK-44500:
--------------------------------------------

[[email protected]] What do you think?

> parse_url treats key as regular expression
> ------------------------------------------
>
>                 Key: SPARK-44500
>                 URL: https://issues.apache.org/jira/browse/SPARK-44500
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.4.1
>            Reporter: Robert Joseph Evans
>            Priority: Major
>
> To be clear I am not 100% sure that this is a bug. It might be a feature, but 
> I don't see anywhere that it is used as a feature. If it is a feature it 
> really should be documented, because there are pitfalls. If it is a bug it 
> should be fixed because it is really confusing and it is simple to shoot 
> yourself in the foot.
> ```scala
> > val urls = Seq("http://foo/bar?abc=BAD&a.c=GOOD";, 
> > "http://foo/bar?a.c=GOOD&abc=BAD";).toDF
> > urls.selectExpr("parse_url(value, 'QUERY', 'a.c')").show(false)
> +----------------------------+
> |parse_url(value, QUERY, a.c)|
> +----------------------------+
> |BAD                         |
> |GOOD                        |
> +----------------------------+
> > urls.selectExpr("parse_url(value, 'QUERY', 'a[c')").show(false)
> java.util.regex.PatternSyntaxException: Unclosed character class near index 15
> (&|^)a[c=([^&]*)
>                ^
>   at java.util.regex.Pattern.error(Pattern.java:1969)
>   at java.util.regex.Pattern.clazz(Pattern.java:2562)
>   at java.util.regex.Pattern.sequence(Pattern.java:2077)
>   at java.util.regex.Pattern.expr(Pattern.java:2010)
>   at java.util.regex.Pattern.compile(Pattern.java:1702)
>   at java.util.regex.Pattern.<init>(Pattern.java:1352)
>   at java.util.regex.Pattern.compile(Pattern.java:1028)
> ```
> The simple fix is to quote the key when making the pattern.
> ```scala
>   private def getPattern(key: UTF8String): Pattern = {
>     Pattern.compile(REGEXPREFIX + Pattern.quote(key.toString) + REGEXSUBFIX)
>   }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to