It sounds like a good way to proceed would be to replace the template
predicates with something that induces a real join.

Kenn

On Tue, May 15, 2018 at 9:33 AM Andrew Pilloud <[email protected]> wrote:

> Calcite does not have the concept of a "CROSS JOIN". It shows up in the
> plan as a LogicalJoin with condition=[true]. We could try rejecting the
> cross join at the planning stage by returning null for them
> in BeamJoinRule.convert(), which might result in a different plan. But
> looking at your query, you have a cross join unless the where clause on the
> inner select contains a row from the outer select.
>
> Andrew
>
> On Tue, May 15, 2018 at 9:15 AM Kenneth Knowles <[email protected]> wrote:
>
>> The logical plan should show you where the cross join is needed. Here is
>> where it is logged:
>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L150
>>
>> (It should probably be put to DEBUG level)
>>
>> If I look at the original template, like
>> https://github.com/gregrahn/tpcds-kit/blob/master/query_templates/query9.tpl
>> I see conditions "[RC.1]". Are those templates expected to be filled with
>> references to the `reason` table, perhaps? How does that change things?
>>
>> I still think it would be good to support CROSS JOIN if we can - the
>> problem of course is huge data size, but when one side is small it would be
>> good for it to work simply.
>>
>> Kenn
>>
>> On Tue, May 15, 2018 at 7:41 AM Kai Jiang <[email protected]> wrote:
>>
>>> Hi everyone,
>>>
>>> To prove the idea of GSoC project, I was working on some simple TPC-DS
>>> queries running with given generated data on direct runner. query
>>> example
>>> <https://gist.github.com/vectorijk/7c54f90aeebfd6fd9e9d2ee224bfed50>
>>>
>>> The example is executed with TPC-DS query 9
>>> <https://gist.github.com/vectorijk/7c54f90aeebfd6fd9e9d2ee224bfed50#file-tpcdssql-java-L176-L222>.
>>> Briefly, Query 9 uses case when clauses to select 5 counting numbers
>>> from store_sales (table 1). In order to show those result numbers, case
>>> when clause inside one select clause. In short, it looks like:
>>> SELECT
>>>
>>> CASE WHEN ( SELECT count(*)  FROM  table 1 WHERE..... )
>>> THEN condition 1
>>> ELSE condition 2,
>>> .....
>>> CASE WHEN .....
>>>
>>> FROM table 2
>>>
>>> IIUC, this query doesn't need join operation on table 1 and table 2 since
>>> outside select clause doesn't need to interfere with table 1.
>>> But, the program shows it does and throws errors message said
>>> "java.lang.UnsupportedOperationException: CROSS JOIN is not supported". 
>>> (error
>>> message detail
>>> <https://gist.github.com/vectorijk/5619a20485edc01113a56e348f87b0c3>)
>>>
>>> To make the query work, I am wondering where I can start with:
>>> 1. see logic plan?
>>> Will logic plan explain why the query need CROSS JOIN?
>>>
>>> 2. cross join support?
>>> I checked all queries in TPC-DS benchmark. Almost every query uses cross
>>> join. It is an important feature needs to implement. Unlike other join, it
>>> consumes a lot of computing resource. But, I think we need cross join in
>>> the future. and support both in join-library? I noticed James has open
>>> BEAM-2194 <https://issues.apache.org/jira/browse/BEAM-2194> for
>>> supporting cross join.
>>>
>>> Looking forward to comments!
>>>
>>> Best,
>>> Kai
>>>
>>> ᐧ
>>>
>>

Reply via email to