[
https://issues.apache.org/jira/browse/CALCITE-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824900#comment-17824900
]
EveyWu edited comment on CALCITE-6278 at 3/9/24 3:54 AM:
----------------------------------------------------------
[~julianhyde] Thanks for the review.
1. "Since Spark 2.0, string literals (including regex patterns) are unescaped
in SQL parser", this description comes from Spark [official
documentation|#regexp].]
!image-2024-03-09-11-13-49-064.png|width=491,height=176!
2. In Spark, unescape is indeed performed in the parser phase. Please view the
details in `AstBuilder`:
[https://github.com/apache/spark/blob/76b1c122cb7d77e8f175b25b935b9296a669d5d8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L2876C1-L2882C4]
The default value of `spark.sql.parser.escapedStringLiterals` is false.
!image-2024-03-09-11-38-08-797.png|width=455,height=85!
3. In Hive, unescape is not done in the SQL AST parser phase, but in the Node
normalization phase(`Dispatcher#dispatch`). `StrExprProcessor` is the processor
for processing string unescape.
[https://github.com/apache/hive/blob/03a76ac70370fb94a78b00496ec511e671c652f2/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L403C1-L405C17]
!image-2024-03-09-11-37-27-816.png|width=520,height=132!
4. "If unescaping is happening in Spark’s parser, Calcite should also do it in
the parser", I think this is unnecessary,
First, like Spark and Hive, different engines have different processing
methods, which do not necessarily have to be processed in the same phase. In
addition, this unescape processing is global and not only for the `rlike`
function. Finally, Calcite is handled in the `rlike` function, which is by far
the simplest and minimal impact modification.
If Calcite also needs to perform global string unescape processing, it can be
discussed separately in the subsequent Jira.
was (Author: eveywu):
[~julianhyde] Thanks for the review.
1. "Since Spark 2.0, string literals (including regex patterns) are unescaped
in SQL parser", this description comes from Spark [official
documentation|[https://spark.apache.org/docs/latest/api/sql/index.html#regexp].]
!image-2024-03-09-11-13-49-064.png|width=491,height=176!
2. In Spark, unescape is indeed performed in the parser phase. Please view the
details in AstBuilder:
[https://github.com/apache/spark/blob/76b1c122cb7d77e8f175b25b935b9296a669d5d8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L2876C1-L2882C4]
The default value of `spark.sql.parser.escapedStringLiterals` is false.
!image-2024-03-09-11-38-08-797.png|width=455,height=85!
3. In Hive, unescape is not done in the SQL AST parser phase, but in the Node
normalization phase(`Dispatcher#dispatch`). `StrExprProcessor` is the processor
for processing string unescape.
[https://github.com/apache/hive/blob/03a76ac70370fb94a78b00496ec511e671c652f2/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L403C1-L405C17]
!image-2024-03-09-11-37-27-816.png|width=520,height=132!
4. "If unescaping is happening in Spark’s parser, Calcite should also do it in
the parser", I think this is unnecessary,
First, like Spark and Hive, different engines have different processing
methods, which do not necessarily have to be processed in the same phase. In
addition, this unescape processing is global and not only for the `rlike`
function. Finally, Calcite is handled in the `rlike` function, which is by far
the simplest and minimal impact modification.
If Calcite also needs to perform global string unescape processing, it can be
discussed separately in the subsequent Jira.
> Add REGEXP, REGEXP_LIKE function (enabled in Spark library)
> ------------------------------------------------------------
>
> Key: CALCITE-6278
> URL: https://issues.apache.org/jira/browse/CALCITE-6278
> Project: Calcite
> Issue Type: Improvement
> Reporter: EveyWu
> Priority: Minor
> Labels: pull-request-available
> Attachments: image-2024-03-07-09-32-27-002.png,
> image-2024-03-09-11-13-49-064.png, image-2024-03-09-11-37-27-816.png,
> image-2024-03-09-11-38-08-797.png
>
>
> Add Spark functions that have been implemented but have different
> OperandTypes/Returns.
> Add Function
> [REGEXP|https://spark.apache.org/docs/latest/api/sql/index.html#regexp],
> [REGEXP_LIKE|https://spark.apache.org/docs/latest/api/sql/index.html#regexp_like]
> # Since this function has the same implementation as the Spark
> [RLIKE|https://spark.apache.org/docs/latest/api/sql/index.html#rlike]
> function, the implementation can be directly reused.
> # Since Spark 2.0, string literals (including regex patterns) are unescaped
> in SQL parser, also fix this bug in calcite.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)