[
https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655352#comment-15655352
]
Fabian Hueske commented on FLINK-4565:
--------------------------------------
Hi [~nvasilishin] and [~jark], I had at look how Calcite handles SQL queries
with an {{x IN (<list-of-literal>)}} predicate and how these are executed by
the Table API.
This is what I found:
1. if the list has fewer entries than 20, the predicate is translated into
disjunctive equality predicates: {{(x = <lit-1> OR x = <lit-2> OR x = <lit3>
...)}}
2. if the list has 20 or more entries, the predicate is translated into a
values-table -> distinct -> equi-join (not sure why distinct is not done at
planning time...). If {{IN}} is the only predicate of the {{WHERE}} clause or
if it is in a conjunctive condition {{y = 10 AND x IN (...)}} the join is an
inner join. If {{IN}} is in an disjunctive condition {{y = 10 OR x IN (...)}},
the join is a left outer join and some additional predicates to check for
{{NULL}} are added.
Unfortunately, the translation (and case distinction) is not done by optimizer
rules but when the validator generates the RelNodes from the SqlNodes.
So for the Table API we would need to implement this logic ourselves, i.e, when
LogicalNodes are translated into RelNodes.
I would suggest to split this issue up into three subissues:
1) Support for IN with less than 20 literals. If we follow Calcite's approach
of many disjunctive equal predicates, we do not need to handle hashsets or any
other special runtime code. Although this would be faster than (up to) 19
checks, I have the feeling that the added code complexity does not pay of the
performance gain (esp. since I am not sure how common IN with 19 literals is).
2) Support for IN with 20 or more literals. This translation needs a bit of
thought to make sure that all cases are covered (no incorrect result, but
possibly rejected queries).
3) Support for IN with subqueries / tables. This should be executed similar to
the 20+ literal case, but instead of a values table the subquery should be used
as input.
What do you think?
> Support for SQL IN operator
> ---------------------------
>
> Key: FLINK-4565
> URL: https://issues.apache.org/jira/browse/FLINK-4565
> Project: Flink
> Issue Type: Improvement
> Components: Table API & SQL
> Reporter: Timo Walther
> Assignee: Nikolay Vasilishin
>
> It seems that Flink SQL supports the uncorrelated sub-query IN operator. But
> it should also be available in the Table API and tested.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)