[jira] [Commented] (ASTERIXDB-3621) Hash-exchange is still added even though collection is already hash-partitioned

ASF subversion and git services (Jira) Thu, 26 Jun 2025 12:14:28 -0700


    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986444#comment-17986444
 ]


ASF subversion and git services commented on ASTERIXDB-3621:
------------------------------------------------------------

Commit 9d8f9e53b31da8565f021c52921e1ae739e251e0 in asterixdb's branch 
refs/heads/master from Ali Alsuliman
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=9d8f9e53b3 ]

[ASTERIXDB-3621][ASTERIXDB-3580][COMP] Use sameAs() for comparing nodes domain 
when getting partitions map

- user model changes: no
- storage format changes: no
- interface changes: no

Details:
When getting the partitions map to be included in the required
partitioning property, use sameAs() nodes domain comparison
similar to PropertiesUtil.matchPartitioningProps() instead of
enforcing nodes order as well.

Ext-ref: MB-67128

Change-Id: Ie53228de01e7980cd96b39523d8c20da3d188860
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/19903
Integration-Tests: Jenkins <[email protected]>
Reviewed-by: Hussain Towaileb <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>
Tested-by: Hussain Towaileb <[email protected]>


> Hash-exchange is still added even though collection is already 
> hash-partitioned
> -------------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-3621
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3621
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: COMP - Compiler
>            Reporter: Ali Alsuliman
>            Assignee: Ali Alsuliman
>            Priority: Major
>              Labels: triaged
>
> The delivered partitioning property of a collection has been restored to 
> "hash-partitioned" (with partitions map) as part of the work done for 
> https://issues.apache.org/jira/browse/ASTERIXDB-3580 to eliminate unnecessary 
> hash exchanges. For example:
> {noformat}
> CREATE COLLECTION ds1 IF NOT EXISTS PRIMARY KEY(id: int);
> CREATE COLLECTION ds2 IF NOT EXISTS PRIMARY KEY(id: int);
> SELECT * FROM ds1 JOIN ds2 ON ds1.id = ds2.id;{noformat}
> The expected plan is the following with no hash-exchanges:
> {noformat}
> distribute result [$$29] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>     assign [$$29] <- [{"ds1": $$ds1, "ds2": $$ds2}] project: [$$29] 
> [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>     -- ASSIGN  |PARTITIONED|
>       project ([$$ds1, $$ds2]) [cardinality: 0.0, op-cost: 0.0, total-cost: 
> 0.0]
>       -- STREAM_PROJECT  |PARTITIONED|
>         exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>         -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>           join (eq($$30, $$31)) [cardinality: 0.0, op-cost: 0.0, total-cost: 
> 0.0]
>           -- HYBRID_HASH_JOIN [$$30][$$31]  |PARTITIONED|
>             exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>             -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>               data-scan []<-[$$30, $$ds1] <- Default.ds1 [cardinality: 0.0, 
> op-cost: 0.0, total-cost: 0.0]
>               -- DATASOURCE_SCAN  |PARTITIONED|
>                 exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>                 -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                   empty-tuple-source [cardinality: 0.0, op-cost: 0.0, 
> total-cost: 0.0]
>                   -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>             exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>             -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>               data-scan []<-[$$31, $$ds2] <- Default.ds2 [cardinality: 0.0, 
> op-cost: 0.0, total-cost: 0.0]
>               -- DATASOURCE_SCAN  |PARTITIONED|
>                 exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>                 -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                   empty-tuple-source [cardinality: 0.0, op-cost: 0.0, 
> total-cost: 0.0]
>                   -- EMPTY_TUPLE_SOURCE  |PARTITIONED|{noformat}
> For the case of compute-storage separation (static partitioning), when 
> determining if the required partitioning (let's say by a JOIN) is the same as 
> the collection's partitioning, we include the partitions map in the 
> comparison to make sure both are using the same partitions map.
> Currently, the the required property (by JOIN and others) will include a 
> partition map if and only if the "query's domain" is exactly the same domain 
> as the cluster's domain and the nodes in the domains are exactly in the same 
> order. That turns out to be problematic since the nodes order in the 
> cluster's domain can be arbitrary while the "query's domain" will always be 
> sorted. In this case, the JOIN and others won't use a partitions map which 
> will make any delivered property by collections having a partition map not 
> satisfy that. We should relax that to only check if the nodes are the same 
> without considering the order similar to how it is being done in 
> PropertiesUtil.matchPartitioningProps().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ASTERIXDB-3621) Hash-exchange is still added even though collection is already hash-partitioned

Reply via email to