It looks like Calcite stopped considering field names in RelNode equality
as of Calcite 2.22 (which we use in Beam v2.34.0+). This can result in a
planner state where two nodes that only differ by field name are considered
equivalent.

I have a fix for Beam in https://github.com/apache/beam/pull/25290 and I'll
send an email to the Calcite dev list with more details.

Andrew

On Fri, Jan 27, 2023 at 11:33 AM Andrew Pilloud <apill...@google.com> wrote:

> Also this is at very least a Beam bug. You can file a Beam issue if you
> want, otherwise I will when I get back.
>
> Andrew
>
> On Fri, Jan 27, 2023 at 11:27 AM Andrew Pilloud <apill...@google.com>
> wrote:
>
>> Hi Talat,
>>
>> I did get your test case running and added some logging to
>> RexProgramBuilder.mergePrograms. There is only one merge that occurs during
>> the test and it has an output type of RecordType(JavaType(int) ID,
>> JavaType(class java.lang.String) V). This does seem like the correct output
>> name but it doesn't match the final output name, so something is still
>> different than the Beam test case. I also modified mergePrograms to
>> purposely corrupt the output names, that did not cause the test to fail or
>> trip the 'assert mergedProg.getOutputRowType() ==
>> topProgram.getOutputRowType();' in mergePrograms. I could not find any
>> Calcite unit tests for RexProgramBuilder.mergePrograms or
>> CoreRules.CALC_MERGE rule so I think it is still probable that the problem
>> is in this area.
>>
>> One minor issue I encountered. It took me a while to get your test case
>> running, it doesn't appear there are any calcite gradle rules to run
>> CoreQuidemTest and constructing the classpath manually was tedious. Did I
>> miss something?
>>
>> I'm still working on this but I'm out today and Monday, it will probably
>> be Wednesday before I make any more progress.
>>
>> Andrew
>>
>> On Fri, Jan 27, 2023 at 10:40 AM Talat Uyarer <
>> tuya...@paloaltonetworks.com> wrote:
>>
>>> Hi Andrew,
>>>
>>> Yes This aligned also with my debugging. In My Kenn's reply you can see
>>> a sql test which I wrote in Calcite. Somehow Calcite does not have this
>>> issue with the 1.28 version.
>>>
>>> !use post
>>> !set outputformat mysql
>>>
>>> #Test aliases with with clause
>>> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id, 
>>> "hr"."emps"."name" as v from "hr"."emps")
>>> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE 
>>> tempTable.v <> '11' ;
>>> +-----+-----------+
>>> | ID  | value     |
>>> +-----+-----------+
>>> | 100 | Bill      |
>>> | 110 | Theodore  |
>>> | 150 | Sebastian |
>>> | 200 | Eric      |
>>> +-----+-----------+
>>> (4 rows)
>>>
>>> !ok
>>>
>>>
>>> On Wed, Jan 25, 2023 at 6:08 PM Andrew Pilloud <apill...@google.com>
>>> wrote:
>>>
>>>> Yes, that worked.
>>>>
>>>> The issue does not occur if I disable all of the following planner
>>>> rules: CoreRules.FILTER_CALC_MERGE, CoreRules.PROJECT_CALC_MERGE,
>>>> LogicalCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE),
>>>> and BeamCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE).
>>>>
>>>> All the rules share a common call to RexProgramBuilder.mergePrograms,
>>>> so I suspect the problem lies there. I spent some time looking but wasn't
>>>> able to find it by code inspection, it looks like this code path is doing
>>>> the right thing with names. I'll spend some time tomorrow trying to
>>>> reproduce this on pure Calcite.
>>>>
>>>> Andrew
>>>>
>>>>
>>>> On Tue, Jan 24, 2023 at 8:24 PM Talat Uyarer <
>>>> tuya...@paloaltonetworks.com> wrote:
>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> Thanks for writing a test for this use case. Without Where clause it
>>>>> works as expected on our test cases also too. Please add where clause on
>>>>> second select. With the below query it does not return column names. I
>>>>> tested on my local also.
>>>>>
>>>>> WITH tempTable (id, v) AS (SELECT f_int as id, f_string as v FROM
>>>>> PCOLLECTION) SELECT id AS fout_int, v AS fout_string FROM tempTable WHERE
>>>>> id > 1
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Tue, Jan 24, 2023 at 5:28 PM Andrew Pilloud <apill...@google.com>
>>>>> wrote:
>>>>>
>>>>>> +dev@beam.apache.org <dev@beam.apache.org>
>>>>>>
>>>>>> I tried reproducing this but was not successful, the output schema
>>>>>> was as expected. I added the following to BeamSqlMultipleSchemasTest.java
>>>>>> at head. (I did discover
>>>>>> that  PAssert.that(result).containsInAnyOrder(output) doesn't validate
>>>>>> column names however.)
>>>>>>
>>>>>>   @Test
>>>>>>   public void testSelectAs() {
>>>>>>     PCollection<Row> input = pipeline.apply(create(row(1, "strstr")));
>>>>>>
>>>>>>     PCollection<Row> result =
>>>>>>         input.apply(SqlTransform.query("WITH tempTable (id, v) AS
>>>>>> (SELECT f_int as id, f_string as v FROM PCOLLECTION) SELECT id AS 
>>>>>> fout_int,
>>>>>> v AS fout_string FROM tempTable"));
>>>>>>
>>>>>>     Schema output_schema =
>>>>>>
>>>>>> Schema.builder().addInt32Field("fout_int").addStringField("fout_string").build();
>>>>>>     assertThat(result.getSchema(), equalTo(output_schema));
>>>>>>
>>>>>>     Row output = Row.withSchema(output_schema).addValues(1,
>>>>>> "strstr").build();
>>>>>>     PAssert.that(result).containsInAnyOrder(output);
>>>>>>     pipeline.run();
>>>>>>   }
>>>>>>
>>>>>> On Tue, Jan 24, 2023 at 8:13 AM Talat Uyarer <
>>>>>> tuya...@paloaltonetworks.com> wrote:
>>>>>>
>>>>>>> Hi Kenn,
>>>>>>>
>>>>>>> Thank you for replying back to my email.
>>>>>>>
>>>>>>> I was under the same impression about Calcite. But I wrote a test on
>>>>>>> Calcite 1.28 too. It is working without issue that I see on BEAM
>>>>>>>
>>>>>>> Here is my test case. If you want you can also run on Calcite.
>>>>>>> Please put under core/src/test/resources/sql as text file. and Run 
>>>>>>> CoreQuidemTest
>>>>>>> class.
>>>>>>>
>>>>>>> !use post
>>>>>>> !set outputformat mysql
>>>>>>>
>>>>>>> #Test aliases with with clause
>>>>>>> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id, 
>>>>>>> "hr"."emps"."name" as v from "hr"."emps")
>>>>>>> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE 
>>>>>>> tempTable.v <> '11' ;
>>>>>>> +-----+-----------+
>>>>>>> | ID  | value     |
>>>>>>> +-----+-----------+
>>>>>>> | 100 | Bill      |
>>>>>>> | 110 | Theodore  |
>>>>>>> | 150 | Sebastian |
>>>>>>> | 200 | Eric      |
>>>>>>> +-----+-----------+
>>>>>>> (4 rows)
>>>>>>>
>>>>>>> !ok
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 23, 2023 at 10:16 AM Kenneth Knowles <k...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Looking at the code that turns a logical CalcRel into a BeamCalcRel
>>>>>>>> I do not see any obvious cause for this:
>>>>>>>> https://github.com/apache/beam/blob/b3aa2e89489898f8c760294ba4dba2310ac53e70/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcRule.java#L69
>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_beam_blob_b3aa2e89489898f8c760294ba4dba2310ac53e70_sdks_java_extensions_sql_src_main_java_org_apache_beam_sdk_extensions_sql_impl_rule_BeamCalcRule.java-23L69&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=KXc2qSceL6qFbFnQ_2qUOHr9mKuc6zYY8rJTNZC8p_wTcNs4M6mHQoCuoc4JfeaA&s=KjzplEf29oFB6uivvdjixpQiArWtfV-1SXpALL-ugEM&e=>
>>>>>>>>
>>>>>>>> I don't like to guess that upstream libraries have the bug, but in
>>>>>>>> this case I wonder if the alias is lost in the Calcite optimizer rule 
>>>>>>>> for
>>>>>>>> merging the projects and filters into a Calc.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> On Mon, Jan 23, 2023 at 10:13 AM Kenneth Knowles <k...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I am not sure I understand the question, but I do see an issue.
>>>>>>>>>
>>>>>>>>> Context: "CalcRel" is an optimized relational operation that is
>>>>>>>>> somewhat like ParDo, with a small snippet of a single-assignment DSL
>>>>>>>>> embedded in it. Calcite will choose to merge all the projects and 
>>>>>>>>> filters
>>>>>>>>> into the node, and then generates Java bytecode to directly execute 
>>>>>>>>> the DSL.
>>>>>>>>>
>>>>>>>>> Problem: it looks like the CalcRel has output columns with aliases
>>>>>>>>> "id" and "v" where it should have output columns with aliases "id" and
>>>>>>>>> "value".
>>>>>>>>>
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>> On Thu, Jan 19, 2023 at 6:01 PM Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Adding: @Andrew Pilloud <apill...@google.com> @Kenneth Knowles
>>>>>>>>>> <k...@google.com>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 12, 2023 at 12:31 PM Talat Uyarer via user <
>>>>>>>>>> u...@beam.apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> I am using Beam 2.43 with Calcite SQL with Java.
>>>>>>>>>>>
>>>>>>>>>>> I have a query with a WITH clause and some aliasing. Looks like
>>>>>>>>>>> Beam Query optimizer after optimizing my query, it drops Select 
>>>>>>>>>>> statement's
>>>>>>>>>>> aliases. Can you help me to identify where the problem is ?
>>>>>>>>>>>
>>>>>>>>>>> This is my query
>>>>>>>>>>> INFO: SQL:
>>>>>>>>>>> WITH `tempTable` (`id`, `v`) AS (SELECT
>>>>>>>>>>> `PCOLLECTION`.`f_nestedRow`.`f_nestedInt` AS `id`,
>>>>>>>>>>> `PCOLLECTION`.`f_nestedRow`.`f_nestedString` AS `v`
>>>>>>>>>>> FROM `beam`.`PCOLLECTION` AS `PCOLLECTION`) (SELECT
>>>>>>>>>>> `tempTable`.`id` AS `id`, `tempTable`.`v` AS `value`
>>>>>>>>>>> FROM `tempTable` AS `tempTable`
>>>>>>>>>>> WHERE `tempTable`.`v` <> '11')
>>>>>>>>>>>
>>>>>>>>>>> This is Calcite Plan look at LogicalProject(id=[$0], value=[$1])
>>>>>>>>>>> in SQL plan.
>>>>>>>>>>>
>>>>>>>>>>> Jan 12, 2023 12:19:08 PM
>>>>>>>>>>> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner 
>>>>>>>>>>> convertToBeamRel
>>>>>>>>>>> INFO: SQLPlan>
>>>>>>>>>>> LogicalProject(id=[$0], value=[$1])
>>>>>>>>>>>   LogicalFilter(condition=[<>($1, '11')])
>>>>>>>>>>>     LogicalProject(id=[$1.f_nestedInt], v=[$1.f_nestedString])
>>>>>>>>>>>       BeamIOSourceRel(table=[[beam, PCOLLECTION]])
>>>>>>>>>>>
>>>>>>>>>>> But Beam Plan does not have a LogicalProject(id=[$0],
>>>>>>>>>>> value=[$1]) or similar.
>>>>>>>>>>>
>>>>>>>>>>> Jan 12, 2023 12:19:08 PM
>>>>>>>>>>> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner 
>>>>>>>>>>> convertToBeamRel
>>>>>>>>>>> INFO: BEAMPlan>
>>>>>>>>>>> BeamCalcRel(expr#0..1=[{inputs}], expr#2=[$t1.f_nestedInt],
>>>>>>>>>>> expr#3=[$t1.f_nestedString], expr#4=['11':VARCHAR], expr#5=[<>($t3, 
>>>>>>>>>>> $t4)],
>>>>>>>>>>> id=[$t2], v=[$t3], $condition=[$t5])
>>>>>>>>>>>   BeamIOSourceRel(table=[[beam, PCOLLECTION]])
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>

Reply via email to