Andrew, Kenn, thanks for your comments! This query is potential using COMMA (Cross) join. Since most of TPC-DS queries use comma join, I took a look at their logic plan. Calcite parses them to inner join and join condition is true. Outside inner join is around with filter condition which is parsing from WHERE clause.
I think a way we could do is how to push filter (WHERE clause's condition) into join condition. I also took a look at Flink sql code. After parsing query, Flink optimizes its logic plan which is get rid of filter and move its condition to join condition. It seems Flink uses calcite optimize rule. Should we move forward to see how it apply to Beam SQL? Best, Kai ᐧ On Wed, May 16, 2018 at 2:46 PM Kenneth Knowles <[email protected]> wrote: > It sounds like a good way to proceed would be to replace the template > predicates with something that induces a real join. > > Kenn > > On Tue, May 15, 2018 at 9:33 AM Andrew Pilloud <[email protected]> > wrote: > >> Calcite does not have the concept of a "CROSS JOIN". It shows up in the >> plan as a LogicalJoin with condition=[true]. We could try rejecting the >> cross join at the planning stage by returning null for them >> in BeamJoinRule.convert(), which might result in a different plan. But >> looking at your query, you have a cross join unless the where clause on the >> inner select contains a row from the outer select. >> >> Andrew >> >> On Tue, May 15, 2018 at 9:15 AM Kenneth Knowles <[email protected]> wrote: >> >>> The logical plan should show you where the cross join is needed. Here is >>> where it is logged: >>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L150 >>> >>> (It should probably be put to DEBUG level) >>> >>> If I look at the original template, like >>> https://github.com/gregrahn/tpcds-kit/blob/master/query_templates/query9.tpl >>> I see conditions "[RC.1]". Are those templates expected to be filled with >>> references to the `reason` table, perhaps? How does that change things? >>> >>> I still think it would be good to support CROSS JOIN if we can - the >>> problem of course is huge data size, but when one side is small it would be >>> good for it to work simply. >>> >>> Kenn >>> >>> On Tue, May 15, 2018 at 7:41 AM Kai Jiang <[email protected]> wrote: >>> >>>> Hi everyone, >>>> >>>> To prove the idea of GSoC project, I was working on some simple TPC-DS >>>> queries running with given generated data on direct runner. query >>>> example >>>> <https://gist.github.com/vectorijk/7c54f90aeebfd6fd9e9d2ee224bfed50> >>>> >>>> The example is executed with TPC-DS query 9 >>>> <https://gist.github.com/vectorijk/7c54f90aeebfd6fd9e9d2ee224bfed50#file-tpcdssql-java-L176-L222>. >>>> Briefly, Query 9 uses case when clauses to select 5 counting numbers >>>> from store_sales (table 1). In order to show those result numbers, case >>>> when clause inside one select clause. In short, it looks like: >>>> SELECT >>>> >>>> CASE WHEN ( SELECT count(*) FROM table 1 WHERE..... ) >>>> THEN condition 1 >>>> ELSE condition 2, >>>> ..... >>>> CASE WHEN ..... >>>> >>>> FROM table 2 >>>> >>>> IIUC, this query doesn't need join operation on table 1 and table 2 since >>>> outside select clause doesn't need to interfere with table 1. >>>> But, the program shows it does and throws errors message said >>>> "java.lang.UnsupportedOperationException: CROSS JOIN is not supported". >>>> (error >>>> message detail >>>> <https://gist.github.com/vectorijk/5619a20485edc01113a56e348f87b0c3>) >>>> >>>> To make the query work, I am wondering where I can start with: >>>> 1. see logic plan? >>>> Will logic plan explain why the query need CROSS JOIN? >>>> >>>> 2. cross join support? >>>> I checked all queries in TPC-DS benchmark. Almost every query uses >>>> cross join. It is an important feature needs to implement. Unlike other >>>> join, it consumes a lot of computing resource. But, I think we need cross >>>> join in the future. and support both in join-library? I noticed >>>> James has open BEAM-2194 >>>> <https://issues.apache.org/jira/browse/BEAM-2194> for supporting cross >>>> join. >>>> >>>> Looking forward to comments! >>>> >>>> Best, >>>> Kai >>>> >>>> ᐧ >>>> >>>
