[jira] [Commented] (ASTERIXDB-3580) Dataset partitioning property should be hash-partitioned with partitions map

ASF subversion and git services (Jira) Tue, 15 Apr 2025 00:26:08 -0700


    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944621#comment-17944621
 ]


ASF subversion and git services commented on ASTERIXDB-3580:
------------------------------------------------------------

Commit ed4a9a96596d7ed3c48edd91eac3011c8958b527 in asterixdb's branch 
refs/heads/master from Ali Alsuliman
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=ed4a9a9659 ]

[ASTERIXDB-3580][COMP] Ensure the computation locations are sorted

- user model changes: no
- storage format changes: no
- interface changes: no

Details:
The cluster locations used by datasets are on sorted nodes.
The computation locations should also be made sorted.

Ext-ref: MB-63354, MB-65314
Change-Id: Id7463f54455ce1e5f874b75399c1ec9b96250b5f
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/19543
Integration-Tests: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>
Reviewed-by: Ian Maxon <[email protected]>


> Dataset partitioning property should be hash-partitioned with partitions map
> ----------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-3580
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3580
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Ali Alsuliman
>            Assignee: Ali Alsuliman
>            Priority: Major
>              Labels: triaged
>
> The dataset partitioning property has been changed to “randomly partitioned” 
> as part of the compute-storage separation work. One issue is that sometimes 
> “hash exchanges” are introduced in the plan (unnecessarily) because of the 
> random partitioning delivered by the dataset. The dataset delivered 
> partitioning property can be “hash partitioned” with the partitions map. This 
> way if an operator like a join operator requires a hash partitioning with the 
> partitions map, the dataset delivered partitioning property will satisfy the 
> requirement and no hash exchanges are introduced.
> For example, the following query:
>  
> {code:java}
> SELECT VALUE c1 FROM c1
> WHERE c1.x NOT IN (
>   SELECT VALUE c2.x
>   FROM c2); {code}
> Has HASH_PARTITION_EXCHANGE [$$33] which is not needed:
>  
> {code:java}
>  distribute result [$$c1]
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>     project ([$$c1])
>     -- STREAM_PROJECT  |PARTITIONED|
>       select ($$32)
>       -- STREAM_SELECT  |PARTITIONED|
>         project ([$$32, $$c1])
>         -- STREAM_PROJECT  |PARTITIONED|
>           exchange
>           -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>             group by ([$$37 := $$33]) decor ([$$c1]) {
>                       aggregate [$$32] <- [empty-stream()]
>                       -- AGGREGATE  |LOCAL|
>                         select (not(is-missing($$36)))
>                         -- STREAM_SELECT  |LOCAL|
>                           nested tuple source
>                           -- NESTED_TUPLE_SOURCE  |LOCAL|
>                    }
>             -- PRE_CLUSTERED_GROUP_BY[$$33]  |PARTITIONED|
>               exchange
>               -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                 order (ASC, $$33)
>                 -- STABLE_SORT [$$33(ASC)]  |PARTITIONED|
>                   exchange
>                   -- HASH_PARTITION_EXCHANGE [$$33]  |PARTITIONED|
>                     project ([$$c1, $$36, $$33])
>                     -- STREAM_PROJECT  |PARTITIONED|
>                       exchange
>                       -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                         left outer join (not(if-missing-or-null(neq($$35, 
> $$28), false)))
>                         -- NESTED_LOOP  |PARTITIONED|
>                           exchange
>                           -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                             assign [$$35] <- [$$c1.getField("x")]
>                             -- ASSIGN  |PARTITIONED|
>                               exchange
>                               -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                 data-scan []<-[$$33, $$c1] <- Default.c1
>                                 -- DATASOURCE_SCAN  |PARTITIONED|
>                                   exchange
>                                   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                     empty-tuple-source
>                                     -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>                           exchange
>                           -- BROADCAST_EXCHANGE  |PARTITIONED|
>                             project ([$$36, $$28])
>                             -- STREAM_PROJECT  |PARTITIONED|
>                               assign [$$36, $$28] <- [true, 
> $$c2.getField("x")]
>                               -- ASSIGN  |PARTITIONED|
>                                 project ([$$c2])
>                                 -- STREAM_PROJECT  |PARTITIONED|
>                                   exchange
>                                   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                     data-scan []<-[$$34, $$c2] <- Default.c2 
> project ({x:any})
>                                     -- DATASOURCE_SCAN  |PARTITIONED|
>                                       exchange
>                                       -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                         empty-tuple-source
>                                         -- EMPTY_TUPLE_SOURCE  
> |PARTITIONED|{code}
> The HASH_PARTITION_EXCHANGE [$$33] should be -- ONE_TO_ONE_EXCHANGE after 
> fixing this.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ASTERIXDB-3580) Dataset partitioning property should be hash-partitioned with partitions map

Reply via email to