[
https://issues.apache.org/jira/browse/ASTERIXDB-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944621#comment-17944621
]
ASF subversion and git services commented on ASTERIXDB-3580:
------------------------------------------------------------
Commit ed4a9a96596d7ed3c48edd91eac3011c8958b527 in asterixdb's branch
refs/heads/master from Ali Alsuliman
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=ed4a9a9659 ]
[ASTERIXDB-3580][COMP] Ensure the computation locations are sorted
- user model changes: no
- storage format changes: no
- interface changes: no
Details:
The cluster locations used by datasets are on sorted nodes.
The computation locations should also be made sorted.
Ext-ref: MB-63354, MB-65314
Change-Id: Id7463f54455ce1e5f874b75399c1ec9b96250b5f
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/19543
Integration-Tests: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>
Reviewed-by: Ian Maxon <[email protected]>
> Dataset partitioning property should be hash-partitioned with partitions map
> ----------------------------------------------------------------------------
>
> Key: ASTERIXDB-3580
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-3580
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Ali Alsuliman
> Assignee: Ali Alsuliman
> Priority: Major
> Labels: triaged
>
> The dataset partitioning property has been changed to “randomly partitioned”
> as part of the compute-storage separation work. One issue is that sometimes
> “hash exchanges” are introduced in the plan (unnecessarily) because of the
> random partitioning delivered by the dataset. The dataset delivered
> partitioning property can be “hash partitioned” with the partitions map. This
> way if an operator like a join operator requires a hash partitioning with the
> partitions map, the dataset delivered partitioning property will satisfy the
> requirement and no hash exchanges are introduced.
> For example, the following query:
>
> {code:java}
> SELECT VALUE c1 FROM c1
> WHERE c1.x NOT IN (
> SELECT VALUE c2.x
> FROM c2); {code}
> Has HASH_PARTITION_EXCHANGE [$$33] which is not needed:
>
> {code:java}
> distribute result [$$c1]
> -- DISTRIBUTE_RESULT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$c1])
> -- STREAM_PROJECT |PARTITIONED|
> select ($$32)
> -- STREAM_SELECT |PARTITIONED|
> project ([$$32, $$c1])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> group by ([$$37 := $$33]) decor ([$$c1]) {
> aggregate [$$32] <- [empty-stream()]
> -- AGGREGATE |LOCAL|
> select (not(is-missing($$36)))
> -- STREAM_SELECT |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- PRE_CLUSTERED_GROUP_BY[$$33] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> order (ASC, $$33)
> -- STABLE_SORT [$$33(ASC)] |PARTITIONED|
> exchange
> -- HASH_PARTITION_EXCHANGE [$$33] |PARTITIONED|
> project ([$$c1, $$36, $$33])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> left outer join (not(if-missing-or-null(neq($$35,
> $$28), false)))
> -- NESTED_LOOP |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> assign [$$35] <- [$$c1.getField("x")]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$33, $$c1] <- Default.c1
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
> exchange
> -- BROADCAST_EXCHANGE |PARTITIONED|
> project ([$$36, $$28])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$36, $$28] <- [true,
> $$c2.getField("x")]
> -- ASSIGN |PARTITIONED|
> project ([$$c2])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$34, $$c2] <- Default.c2
> project ({x:any})
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> -- EMPTY_TUPLE_SOURCE
> |PARTITIONED|{code}
> The HASH_PARTITION_EXCHANGE [$$33] should be -- ONE_TO_ONE_EXCHANGE after
> fixing this.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)