[
https://issues.apache.org/jira/browse/CALCITE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruben Q L resolved CALCITE-5003.
--------------------------------
Resolution: Fixed
Fixed via
https://github.com/apache/calcite/commit/2789f5e4c361b052967f42b87447f04cc1ce7896
> MergeUnion on types with different collators produces wrong result
> ------------------------------------------------------------------
>
> Key: CALCITE-5003
> URL: https://issues.apache.org/jira/browse/CALCITE-5003
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.27.0
> Reporter: Ruben Q L
> Assignee: Ruben Q L
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.31.0
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> MergeUnion on types with different collators produces wrong result.
> Problem can be reproduced with the following test (in
> {{EnumerableStringComparisonTest}}):
> {code}
> @Test void testMergeUnionOnStringDifferentCollation() {
> tester()
> .query("?")
> .withHook(Hook.PLANNER, (Consumer<RelOptPlanner>) planner ->
> planner.removeRule(EnumerableRules.ENUMERABLE_UNION_RULE))
> .withRel(b -> {
> final RelBuilder builder = b.transform(c ->
> c.withSimplifyValues(false));
> return builder
> .values(builder.getTypeFactory().builder()
> .add("name",
>
> builder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR)).build(),
> "facilities", "HR", "administration", "Marketing")
> .values(createRecordVarcharSpecialCollation(builder),
> "Marketing", "administration", "presales", "HR")
> .union(false)
> .sort(0)
> .build();
> })
> .explainHookMatches("" // It is important that we have MergeUnion in
> the plan
> + "EnumerableMergeUnion(all=[false])\n"
> + " EnumerableSort(sort0=[$0], dir0=[ASC])\n"
> + " EnumerableValues(tuples=[[{ 'facilities' }, { 'HR' }, {
> 'administration' }, { 'Marketing' }]])\n"
> + " EnumerableSort(sort0=[$0], dir0=[ASC])\n"
> + " EnumerableValues(tuples=[[{ 'Marketing' }, {
> 'administration' }, { 'presales' }, { 'HR' }]])\n")
> .returnsOrdered("name=administration\n"
> + "name=facilities\n"
> + "name=HR\n"
> + "name=Marketing\n"
> + "name=presales");
> }
> {code}
> which fails with:
> {noformat}
> java.lang.AssertionError:
> Expected:
> "name=administration\nname=facilities\nname=HR\nname=Marketing\nname=presales"
> but: was
> "name=administration\nname=HR\nname=Marketing\nname=administration\nname=facilities\nname=Marketing\nname=presales"
> {noformat}
> The problem is that, in case of different collators, the pre-requisite of the
> the MergeUnion (inputs sorted) is not fulfilled, since inputs are technically
> sorted, but not using the same sorting collator, so they are not comparable
> by the MergeUnion algorithm.
> A possible solution could be not applying EnumerableMergeUnionRule in this
> case.
> A more clever solution could be achieved if the rule pushes a Sort + Cast +
> input (and not just Sort + input) in case the input's key type differs
> collation-wise with the union's result type.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)