[
https://issues.apache.org/jira/browse/KYLIN-6047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928363#comment-17928363
]
Guoliang Sun commented on KYLIN-6047:
-------------------------------------
h3. Dev Design
After clarifying the issues above, the only definitive fix is to **implement
the conversion of the `Row` operator into a logical plan that Spark can
recognize.
To clarify Spark's behavior in handling user SQL, we push the query directly to
Spark for execution and refer to its logical plan. From the highlighted issues,
we can derive the following two key tasks:
1. Add Support for Matching `ROW` Operator:
- Extend the original column values to handle the `ROW` operator.
- This ensures that the `ROW` operator (e.g., `(a, b)`) is correctly
interpreted and processed.
2. Enhance Handling of `IN` Operator:
- Similarly, extend the `IN` operator to support multiple columns
corresponding to multiple values (i.e., the `ROW` operator).
- This ensures compatibility with cases like `(a, b) IN ((1, 2), (3, 4))`.
By implementing these two enhancements, we ensure that Kylin can properly
convert Calcite's logical plan into Spark's logical plan, avoiding errors and
performance issues caused by unsupported operators or excessive condition
values.
> Error Occurs When the Number of Values in an IN Clause Reaches 20
> -----------------------------------------------------------------
>
> Key: KYLIN-6047
> URL: https://issues.apache.org/jira/browse/KYLIN-6047
> Project: Kylin
> Issue Type: Bug
> Affects Versions: 5.0.0
> Reporter: Guoliang Sun
> Priority: Major
> Attachments: image-2025-02-19-17-21-17-187.png
>
>
> h3. Temporary Solution
> Increase the value of `kylin.query.convert-in-to-or-threshold`. However,
> setting this parameter too high may lead to performance issues, as there
> could be cases where the number of values in the `IN` clause exceeds 100. A
> fix is required to address this issue properly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)