[
https://issues.apache.org/jira/browse/ASTERIXDB-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986444#comment-17986444
]
ASF subversion and git services commented on ASTERIXDB-3621:
------------------------------------------------------------
Commit 9d8f9e53b31da8565f021c52921e1ae739e251e0 in asterixdb's branch
refs/heads/master from Ali Alsuliman
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=9d8f9e53b3 ]
[ASTERIXDB-3621][ASTERIXDB-3580][COMP] Use sameAs() for comparing nodes domain
when getting partitions map
- user model changes: no
- storage format changes: no
- interface changes: no
Details:
When getting the partitions map to be included in the required
partitioning property, use sameAs() nodes domain comparison
similar to PropertiesUtil.matchPartitioningProps() instead of
enforcing nodes order as well.
Ext-ref: MB-67128
Change-Id: Ie53228de01e7980cd96b39523d8c20da3d188860
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/19903
Integration-Tests: Jenkins <[email protected]>
Reviewed-by: Hussain Towaileb <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>
Tested-by: Hussain Towaileb <[email protected]>
> Hash-exchange is still added even though collection is already
> hash-partitioned
> -------------------------------------------------------------------------------
>
> Key: ASTERIXDB-3621
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-3621
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: COMP - Compiler
> Reporter: Ali Alsuliman
> Assignee: Ali Alsuliman
> Priority: Major
> Labels: triaged
>
> The delivered partitioning property of a collection has been restored to
> "hash-partitioned" (with partitions map) as part of the work done for
> https://issues.apache.org/jira/browse/ASTERIXDB-3580 to eliminate unnecessary
> hash exchanges. For example:
> {noformat}
> CREATE COLLECTION ds1 IF NOT EXISTS PRIMARY KEY(id: int);
> CREATE COLLECTION ds2 IF NOT EXISTS PRIMARY KEY(id: int);
> SELECT * FROM ds1 JOIN ds2 ON ds1.id = ds2.id;{noformat}
> The expected plan is the following with no hash-exchanges:
> {noformat}
> distribute result [$$29] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- DISTRIBUTE_RESULT |PARTITIONED|
> exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> assign [$$29] <- [{"ds1": $$ds1, "ds2": $$ds2}] project: [$$29]
> [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- ASSIGN |PARTITIONED|
> project ([$$ds1, $$ds2]) [cardinality: 0.0, op-cost: 0.0, total-cost:
> 0.0]
> -- STREAM_PROJECT |PARTITIONED|
> exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> join (eq($$30, $$31)) [cardinality: 0.0, op-cost: 0.0, total-cost:
> 0.0]
> -- HYBRID_HASH_JOIN [$$30][$$31] |PARTITIONED|
> exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$30, $$ds1] <- Default.ds1 [cardinality: 0.0,
> op-cost: 0.0, total-cost: 0.0]
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source [cardinality: 0.0, op-cost: 0.0,
> total-cost: 0.0]
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
> exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$31, $$ds2] <- Default.ds2 [cardinality: 0.0,
> op-cost: 0.0, total-cost: 0.0]
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source [cardinality: 0.0, op-cost: 0.0,
> total-cost: 0.0]
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|{noformat}
> For the case of compute-storage separation (static partitioning), when
> determining if the required partitioning (let's say by a JOIN) is the same as
> the collection's partitioning, we include the partitions map in the
> comparison to make sure both are using the same partitions map.
> Currently, the the required property (by JOIN and others) will include a
> partition map if and only if the "query's domain" is exactly the same domain
> as the cluster's domain and the nodes in the domains are exactly in the same
> order. That turns out to be problematic since the nodes order in the
> cluster's domain can be arbitrary while the "query's domain" will always be
> sorted. In this case, the JOIN and others won't use a partitions map which
> will make any delivered property by collections having a partition map not
> satisfy that. We should relax that to only check if the nodes are the same
> without considering the order similar to how it is being done in
> PropertiesUtil.matchPartitioningProps().
--
This message was sent by Atlassian Jira
(v8.20.10#820010)