[ 
https://issues.apache.org/jira/browse/IGNITE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262564#comment-17262564
 ] 

Ignite TC Bot commented on IGNITE-8732:
---------------------------------------

{panel:title=Branch: [pull/8637/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/8637/head] Base: [master] : New Tests 
(34)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}Queries 1{color} [[tests 
34|https://ci.ignite.apache.org/viewLog.html?buildId=5821006]]
* {color:#013220}IgniteBinaryCacheQueryTestSuite: 
ReplicatedSqlCustomPartitionsTest.testGroupByNonIndexedField - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite: 
ReplicatedSqlCustomPartitionsTest.testBasicSelect - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite: 
ReplicatedSqlCustomPartitionsTest.testSelectInStatic - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite: 
ReplicatedSqlCustomPartitionsTest.testRightJoin - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite: 
ReplicatedSqlCustomPartitionsTest.testFullOuterDistributedJoinIsNotSupported - 
PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite: 
ReplicatedSqlCustomPartitionsTest.testWhereLess - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite: 
ReplicatedSqlCustomPartitionsTest.testBasicDistinct - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite: 
ReplicatedSqlCustomPartitionsTest.testWhereGreater - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite: 
ReplicatedSqlCustomPartitionsTest.testLeftJoinPartitionedReplicated - 
PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite: 
ReplicatedSqlCustomPartitionsTest.testInnerJoinReplicatedPartitioned - 
PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite: 
ReplicatedSqlCustomPartitionsTest.testLeftJoin - PASSED{color}
... and 23 new tests

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=5821025&buildTypeId=IgniteTests24Java8_RunAll]

> SQL: REPLICATED cache cannot be left-joined to PARTITIONED
> ----------------------------------------------------------
>
>                 Key: IGNITE-8732
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8732
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql
>    Affects Versions: 2.5
>            Reporter: Vladimir Ozerov
>            Assignee: Stanislav Lukyanov
>            Priority: Major
>              Labels: sql-engine
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> *Steps to reproduce*
>  # Run 
> {{org.apache.ignite.sqltests.ReplicatedSqlTest#testLeftJoinReplicatedPartitioned}}
>  # Observe that we have 2x results on 2-node cluster
> *Root Cause*
>  {{left LEFT JOIN right ON cond}} operation assumes full scan of of a left 
> expression. Currently we perform this scan on every node and then simply 
> merge results on reducer. Two nodes, two scans of {{REPLICATED}} cache, 2x 
> results.
> *Potential Solutions*
>  We may consider several solutions. Deeper analysis is required to understand 
> which is the right one.
>  # Perform deduplication on reducer - this most prospective and general 
> technique, described in more details below
>  # Treat {{REPLICATED}} cache as {{PARTITIONED}}. Essentially, we just need 
> to pass proper backup filter. But what if {{REPLICATED}} cache spans more 
> nodes than {{PARTITIONED}}? We cannot rely on primary/backup in this case
>  # Implement additional execution phase as follows:
> {code:java}
> SELECT left.cols, right.cols FROM left INNER JOIN right ON cond;              
>             // Get "inner join" part
> UNION
> UNICAST SELECT left.cols, [NULL].cols FROM left WHERE left.id NOT IN ([ids 
> from the first phase]) // Get "outer join" part
> {code}
> *Reducer Deduplication*
>  The idea is to get all data locally and then perform final deduplication. 
> This may incur high network overhead, because of lot of duplicated left parts 
> would be transferred. However, this could be optimized greatly with the 
> following techniques applied one after another
>  # Semi-jions: {{left}} is {{joined}} on mapper node, but instead of sending 
> {{(left, right)}} relation, we send {{(left) + (right)}}
>  # In case {{left}} part is known to be idempotent (i.e. it produces the same 
> result set on all nodes), only one node will send {{(left) + (right)}}, other 
> nodes will send {{(right)}} only
>  # Merge {{left}} results with if needed (i.e. if idempotence-related opto 
> was not applicable)
>  # Join {{left}} and {{right}} parts on reducer
> *UPDATE*
> After a few attempts at the implementation, the solution of treating 
> REPLICATED cache as PARTITIONED looks the most practical. The solution works 
> in a limited case:
>  * REPLICATED and PARTITIONED both have the same affinity function, number of 
> partitions, node filter
>  ** Note that REPLICATED has a different number of partitions by default
>  * The JOIN is done on an affinity column of both caches
>  ** Note that users often don’t create affinity keys for REPLICATED caches 
> today
>  * distributedJoins=false (distributed joins aren’t supported for now)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to