comphead commented on issue #2667:
URL:
https://github.com/apache/datafusion-comet/issues/2667#issuecomment-3481588937
Spark Plan
```
== Physical Plan ==
AdaptiveSparkPlan (55)
+- Sort (54)
+- Exchange (53)
+- HashAggregate (52)
+- Exchange (51)
+- HashAggregate (50)
+- Project (49)
+- BroadcastHashJoin Inner BuildRight (48)
:- Project (44)
: +- BroadcastHashJoin Inner BuildRight (43)
: :- Project (38)
: : +- SortMergeJoin LeftAnti (37)
: : :- SortMergeJoin LeftAnti (26)
: : : :- SortMergeJoin LeftSemi (15)
: : : : :- Sort (4)
: : : : : +- Exchange (3)
: : : : : +- Filter (2)
: : : : : +- Scan parquet (1)
: : : : +- Sort (14)
: : : : +- Exchange (13)
: : : : +- Project (12)
: : : : +- BroadcastHashJoin Inner
BuildRight (11)
: : : : :- Filter (6)
: : : : : +- Scan parquet (5)
: : : : +- BroadcastExchange (10)
: : : : +- Project (9)
: : : : +- Filter (8)
: : : : +- Scan parquet
(7)
: : : +- Sort (25)
: : : +- Exchange (24)
: : : +- Project (23)
: : : +- BroadcastHashJoin Inner
BuildRight (22)
: : : :- Filter (17)
: : : : +- Scan parquet (16)
: : : +- BroadcastExchange (21)
: : : +- Project (20)
: : : +- Filter (19)
: : : +- Scan parquet
(18)
: : +- Sort (36)
: : +- Exchange (35)
: : +- Project (34)
: : +- BroadcastHashJoin Inner
BuildRight (33)
: : :- Filter (28)
: : : +- Scan parquet (27)
: : +- BroadcastExchange (32)
: : +- Project (31)
: : +- Filter (30)
: : +- Scan parquet (29)
: +- BroadcastExchange (42)
: +- Project (41)
: +- Filter (40)
: +- Scan parquet (39)
+- BroadcastExchange (47)
+- Filter (46)
+- Scan parquet (45)
(1) Scan parquet
Output [3]: [c_customer_sk#202L, c_current_cdemo_sk#204,
c_current_addr_sk#206L]
Batched: true
Location: InMemoryFileIndex
[file:/Users/ovoievodin/dev/prj/apple/ovoievodin/rust/datafusion-benchmarks/tpcds/data_parquet/customer.parquet]
PushedFilters: [IsNotNull(c_current_addr_sk), IsNotNull(c_current_cdemo_sk)]
ReadSchema:
struct<c_customer_sk:bigint,c_current_cdemo_sk:double,c_current_addr_sk:bigint>
(2) Filter
Input [3]: [c_customer_sk#202L, c_current_cdemo_sk#204,
c_current_addr_sk#206L]
Condition : ((isnotnull(c_current_addr_sk#206L) AND
isnotnull(c_current_cdemo_sk#204)) AND might_contain(Subquery subquery#925,
[id=#222], xxhash64(c_current_addr_sk#206L, 42)))
(3) Exchange
Input [3]: [c_customer_sk#202L, c_current_cdemo_sk#204,
c_current_addr_sk#206L]
Arguments:
hashpartitioning(knownfloatingpointnormalized(normalizenanandzero(cast(c_customer_sk#202L
as double))), 200), ENSURE_REQUIREMENTS, [plan_id=259]
(4) Sort
Input [3]: [c_customer_sk#202L, c_current_cdemo_sk#204,
c_current_addr_sk#206L]
Arguments:
[knownfloatingpointnormalized(normalizenanandzero(cast(c_customer_sk#202L as
double))) ASC NULLS FIRST], false, 0
(5) Scan parquet
Output [2]: [ss_sold_date_sk#580, ss_customer_sk#583]
Batched: true
Location: InMemoryFileIndex
[file:/Users/ovoievodin/dev/prj/apple/ovoievodin/rust/datafusion-benchmarks/tpcds/data_parquet/store_sales.parquet]
PushedFilters: [IsNotNull(ss_sold_date_sk)]
ReadSchema: struct<ss_sold_date_sk:double,ss_customer_sk:double>
(6) Filter
Input [2]: [ss_sold_date_sk#580, ss_customer_sk#583]
Condition : isnotnull(ss_sold_date_sk#580)
(7) Scan parquet
Output [3]: [d_date_sk#282L, d_year#288L, d_moy#290L]
Batched: true
Location: InMemoryFileIndex
[file:/Users/ovoievodin/dev/prj/apple/ovoievodin/rust/datafusion-benchmarks/tpcds/data_parquet/date_dim.parquet]
PushedFilters: [IsNotNull(d_year), IsNotNull(d_moy), EqualTo(d_year,2002),
GreaterThanOrEqual(d_moy,2), LessThanOrEqual(d_moy,4), IsNotNull(d_date_sk)]
ReadSchema: struct<d_date_sk:bigint,d_year:bigint,d_moy:bigint>
(8) Filter
Input [3]: [d_date_sk#282L, d_year#288L, d_moy#290L]
Condition : (((((isnotnull(d_year#288L) AND isnotnull(d_moy#290L)) AND
(d_year#288L = 2002)) AND (d_moy#290L >= 2)) AND (d_moy#290L <= 4)) AND
isnotnull(d_date_sk#282L))
(9) Project
Output [1]: [d_date_sk#282L]
Input [3]: [d_date_sk#282L, d_year#288L, d_moy#290L]
(10) BroadcastExchange
Input [1]: [d_date_sk#282L]
Arguments:
HashedRelationBroadcastMode(List(knownfloatingpointnormalized(normalizenanandzero(cast(input[0,
bigint, true] as double)))),false), [plan_id=254]
(11) BroadcastHashJoin
Left keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(ss_sold_date_sk#580))]
Right keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(cast(d_date_sk#282L as
double)))]
Join type: Inner
Join condition: None
(12) Project
Output [1]: [ss_customer_sk#583]
Input [3]: [ss_sold_date_sk#580, ss_customer_sk#583, d_date_sk#282L]
(13) Exchange
Input [1]: [ss_customer_sk#583]
Arguments:
hashpartitioning(knownfloatingpointnormalized(normalizenanandzero(ss_customer_sk#583)),
200), ENSURE_REQUIREMENTS, [plan_id=260]
(14) Sort
Input [1]: [ss_customer_sk#583]
Arguments:
[knownfloatingpointnormalized(normalizenanandzero(ss_customer_sk#583)) ASC
NULLS FIRST], false, 0
(15) SortMergeJoin
Left keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(cast(c_customer_sk#202L as
double)))]
Right keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(ss_customer_sk#583))]
Join type: LeftSemi
Join condition: None
(16) Scan parquet
Output [2]: [ws_sold_date_sk#730, ws_bill_customer_sk#734]
Batched: true
Location: InMemoryFileIndex
[file:/Users/ovoievodin/dev/prj/apple/ovoievodin/rust/datafusion-benchmarks/tpcds/data_parquet/web_sales.parquet]
PushedFilters: [IsNotNull(ws_sold_date_sk)]
ReadSchema: struct<ws_sold_date_sk:double,ws_bill_customer_sk:double>
(17) Filter
Input [2]: [ws_sold_date_sk#730, ws_bill_customer_sk#734]
Condition : isnotnull(ws_sold_date_sk#730)
(18) Scan parquet
Output [3]: [d_date_sk#859L, d_year#865L, d_moy#867L]
Batched: true
Location: InMemoryFileIndex
[file:/Users/ovoievodin/dev/prj/apple/ovoievodin/rust/datafusion-benchmarks/tpcds/data_parquet/date_dim.parquet]
PushedFilters: [IsNotNull(d_year), IsNotNull(d_moy), EqualTo(d_year,2002),
GreaterThanOrEqual(d_moy,2), LessThanOrEqual(d_moy,4), IsNotNull(d_date_sk)]
ReadSchema: struct<d_date_sk:bigint,d_year:bigint,d_moy:bigint>
(19) Filter
Input [3]: [d_date_sk#859L, d_year#865L, d_moy#867L]
Condition : (((((isnotnull(d_year#865L) AND isnotnull(d_moy#867L)) AND
(d_year#865L = 2002)) AND (d_moy#867L >= 2)) AND (d_moy#867L <= 4)) AND
isnotnull(d_date_sk#859L))
(20) Project
Output [1]: [d_date_sk#859L]
Input [3]: [d_date_sk#859L, d_year#865L, d_moy#867L]
(21) BroadcastExchange
Input [1]: [d_date_sk#859L]
Arguments:
HashedRelationBroadcastMode(List(knownfloatingpointnormalized(normalizenanandzero(cast(input[0,
bigint, true] as double)))),false), [plan_id=264]
(22) BroadcastHashJoin
Left keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(ws_sold_date_sk#730))]
Right keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(cast(d_date_sk#859L as
double)))]
Join type: Inner
Join condition: None
(23) Project
Output [1]: [ws_bill_customer_sk#734]
Input [3]: [ws_sold_date_sk#730, ws_bill_customer_sk#734, d_date_sk#859L]
(24) Exchange
Input [1]: [ws_bill_customer_sk#734]
Arguments:
hashpartitioning(knownfloatingpointnormalized(normalizenanandzero(ws_bill_customer_sk#734)),
200), ENSURE_REQUIREMENTS, [plan_id=269]
(25) Sort
Input [1]: [ws_bill_customer_sk#734]
Arguments:
[knownfloatingpointnormalized(normalizenanandzero(ws_bill_customer_sk#734)) ASC
NULLS FIRST], false, 0
(26) SortMergeJoin
Left keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(cast(c_customer_sk#202L as
double)))]
Right keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(ws_bill_customer_sk#734))]
Join type: LeftAnti
Join condition: None
(27) Scan parquet
Output [2]: [cs_sold_date_sk#134, cs_ship_customer_sk#141]
Batched: true
Location: InMemoryFileIndex
[file:/Users/ovoievodin/dev/prj/apple/ovoievodin/rust/datafusion-benchmarks/tpcds/data_parquet/catalog_sales.parquet]
PushedFilters: [IsNotNull(cs_sold_date_sk)]
ReadSchema: struct<cs_sold_date_sk:double,cs_ship_customer_sk:double>
(28) Filter
Input [2]: [cs_sold_date_sk#134, cs_ship_customer_sk#141]
Condition : isnotnull(cs_sold_date_sk#134)
(29) Scan parquet
Output [3]: [d_date_sk#887L, d_year#893L, d_moy#895L]
Batched: true
Location: InMemoryFileIndex
[file:/Users/ovoievodin/dev/prj/apple/ovoievodin/rust/datafusion-benchmarks/tpcds/data_parquet/date_dim.parquet]
PushedFilters: [IsNotNull(d_year), IsNotNull(d_moy), EqualTo(d_year,2002),
GreaterThanOrEqual(d_moy,2), LessThanOrEqual(d_moy,4), IsNotNull(d_date_sk)]
ReadSchema: struct<d_date_sk:bigint,d_year:bigint,d_moy:bigint>
(30) Filter
Input [3]: [d_date_sk#887L, d_year#893L, d_moy#895L]
Condition : (((((isnotnull(d_year#893L) AND isnotnull(d_moy#895L)) AND
(d_year#893L = 2002)) AND (d_moy#895L >= 2)) AND (d_moy#895L <= 4)) AND
isnotnull(d_date_sk#887L))
(31) Project
Output [1]: [d_date_sk#887L]
Input [3]: [d_date_sk#887L, d_year#893L, d_moy#895L]
(32) BroadcastExchange
Input [1]: [d_date_sk#887L]
Arguments:
HashedRelationBroadcastMode(List(knownfloatingpointnormalized(normalizenanandzero(cast(input[0,
bigint, true] as double)))),false), [plan_id=272]
(33) BroadcastHashJoin
Left keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(cs_sold_date_sk#134))]
Right keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(cast(d_date_sk#887L as
double)))]
Join type: Inner
Join condition: None
(34) Project
Output [1]: [cs_ship_customer_sk#141]
Input [3]: [cs_sold_date_sk#134, cs_ship_customer_sk#141, d_date_sk#887L]
(35) Exchange
Input [1]: [cs_ship_customer_sk#141]
Arguments:
hashpartitioning(knownfloatingpointnormalized(normalizenanandzero(cs_ship_customer_sk#141)),
200), ENSURE_REQUIREMENTS, [plan_id=277]
(36) Sort
Input [1]: [cs_ship_customer_sk#141]
Arguments:
[knownfloatingpointnormalized(normalizenanandzero(cs_ship_customer_sk#141)) ASC
NULLS FIRST], false, 0
(37) SortMergeJoin
Left keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(cast(c_customer_sk#202L as
double)))]
Right keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(cs_ship_customer_sk#141))]
Join type: LeftAnti
Join condition: None
(38) Project
Output [2]: [c_current_cdemo_sk#204, c_current_addr_sk#206L]
Input [3]: [c_customer_sk#202L, c_current_cdemo_sk#204,
c_current_addr_sk#206L]
(39) Scan parquet
Output [2]: [ca_address_sk#238L, ca_state#246]
Batched: true
Location: InMemoryFileIndex
[file:/Users/ovoievodin/dev/prj/apple/ovoievodin/rust/datafusion-benchmarks/tpcds/data_parquet/customer_address.parquet]
PushedFilters: [In(ca_state, [IN,MS,VA]), IsNotNull(ca_address_sk)]
ReadSchema: struct<ca_address_sk:bigint,ca_state:string>
(40) Filter
Input [2]: [ca_address_sk#238L, ca_state#246]
Condition : (ca_state#246 IN (IN,VA,MS) AND isnotnull(ca_address_sk#238L))
(41) Project
Output [1]: [ca_address_sk#238L]
Input [2]: [ca_address_sk#238L, ca_state#246]
(42) BroadcastExchange
Input [1]: [ca_address_sk#238L]
Arguments: HashedRelationBroadcastMode(List(input[0, bigint, true]),false),
[plan_id=282]
(43) BroadcastHashJoin
Left keys [1]: [c_current_addr_sk#206L]
Right keys [1]: [ca_address_sk#238L]
Join type: Inner
Join condition: None
(44) Project
Output [1]: [c_current_cdemo_sk#204]
Input [3]: [c_current_cdemo_sk#204, c_current_addr_sk#206L,
ca_address_sk#238L]
(45) Scan parquet
Output [6]: [cd_demo_sk#264L, cd_gender#265, cd_marital_status#266,
cd_education_status#267, cd_purchase_estimate#268L, cd_credit_rating#269]
Batched: true
Location: InMemoryFileIndex
[file:/Users/ovoievodin/dev/prj/apple/ovoievodin/rust/datafusion-benchmarks/tpcds/data_parquet/customer_demographics.parquet]
PushedFilters: [IsNotNull(cd_demo_sk)]
ReadSchema:
struct<cd_demo_sk:bigint,cd_gender:string,cd_marital_status:string,cd_education_status:string,cd_purchase_estimate:bigint,cd_credit_rating:string>
(46) Filter
Input [6]: [cd_demo_sk#264L, cd_gender#265, cd_marital_status#266,
cd_education_status#267, cd_purchase_estimate#268L, cd_credit_rating#269]
Condition : isnotnull(cd_demo_sk#264L)
(47) BroadcastExchange
Input [6]: [cd_demo_sk#264L, cd_gender#265, cd_marital_status#266,
cd_education_status#267, cd_purchase_estimate#268L, cd_credit_rating#269]
Arguments:
HashedRelationBroadcastMode(List(knownfloatingpointnormalized(normalizenanandzero(cast(input[0,
bigint, false] as double)))),false), [plan_id=286]
(48) BroadcastHashJoin
Left keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(c_current_cdemo_sk#204))]
Right keys [1]:
[knownfloatingpointnormalized(normalizenanandzero(cast(cd_demo_sk#264L as
double)))]
Join type: Inner
Join condition: None
(49) Project
Output [5]: [cd_gender#265, cd_marital_status#266, cd_education_status#267,
cd_purchase_estimate#268L, cd_credit_rating#269]
Input [7]: [c_current_cdemo_sk#204, cd_demo_sk#264L, cd_gender#265,
cd_marital_status#266, cd_education_status#267, cd_purchase_estimate#268L,
cd_credit_rating#269]
(50) HashAggregate
Input [5]: [cd_gender#265, cd_marital_status#266, cd_education_status#267,
cd_purchase_estimate#268L, cd_credit_rating#269]
Keys [5]: [cd_gender#265, cd_marital_status#266, cd_education_status#267,
cd_purchase_estimate#268L, cd_credit_rating#269]
Functions [1]: [partial_count(1)]
Aggregate Attributes [1]: [count#926L]
Results [6]: [cd_gender#265, cd_marital_status#266, cd_education_status#267,
cd_purchase_estimate#268L, cd_credit_rating#269, count#927L]
(51) Exchange
Input [6]: [cd_gender#265, cd_marital_status#266, cd_education_status#267,
cd_purchase_estimate#268L, cd_credit_rating#269, count#927L]
Arguments: hashpartitioning(cd_gender#265, cd_marital_status#266,
cd_education_status#267, cd_purchase_estimate#268L, cd_credit_rating#269, 200),
ENSURE_REQUIREMENTS, [plan_id=291]
(52) HashAggregate
Input [6]: [cd_gender#265, cd_marital_status#266, cd_education_status#267,
cd_purchase_estimate#268L, cd_credit_rating#269, count#927L]
Keys [5]: [cd_gender#265, cd_marital_status#266, cd_education_status#267,
cd_purchase_estimate#268L, cd_credit_rating#269]
Functions [1]: [count(1)]
Aggregate Attributes [1]: [count(1)#856L]
Results [8]: [cd_gender#265, cd_marital_status#266, cd_education_status#267,
count(1)#856L AS cnt1#850L, cd_purchase_estimate#268L, count(1)#856L AS
cnt2#851L, cd_credit_rating#269, count(1)#856L AS cnt3#852L]
(53) Exchange
Input [8]: [cd_gender#265, cd_marital_status#266, cd_education_status#267,
cnt1#850L, cd_purchase_estimate#268L, cnt2#851L, cd_credit_rating#269,
cnt3#852L]
Arguments: rangepartitioning(cd_gender#265 ASC NULLS FIRST,
cd_marital_status#266 ASC NULLS FIRST, cd_education_status#267 ASC NULLS FIRST,
cd_purchase_estimate#268L ASC NULLS FIRST, cd_credit_rating#269 ASC NULLS
FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=294]
(54) Sort
Input [8]: [cd_gender#265, cd_marital_status#266, cd_education_status#267,
cnt1#850L, cd_purchase_estimate#268L, cnt2#851L, cd_credit_rating#269,
cnt3#852L]
Arguments: [cd_gender#265 ASC NULLS FIRST, cd_marital_status#266 ASC NULLS
FIRST, cd_education_status#267 ASC NULLS FIRST, cd_purchase_estimate#268L ASC
NULLS FIRST, cd_credit_rating#269 ASC NULLS FIRST], true, 0
(55) AdaptiveSparkPlan
Output [8]: [cd_gender#265, cd_marital_status#266, cd_education_status#267,
cnt1#850L, cd_purchase_estimate#268L, cnt2#851L, cd_credit_rating#269,
cnt3#852L]
Arguments: isFinalPlan=false
===== Subqueries =====
Subquery:1 Hosting operator id = 2 Hosting Expression = Subquery
subquery#925, [id=#222]
AdaptiveSparkPlan (62)
+- ObjectHashAggregate (61)
+- Exchange (60)
+- ObjectHashAggregate (59)
+- Project (58)
+- Filter (57)
+- Scan parquet (56)
(56) Scan parquet
Output [2]: [ca_address_sk#238L, ca_state#246]
Batched: true
Location: InMemoryFileIndex
[file:/Users/ovoievodin/dev/prj/apple/ovoievodin/rust/datafusion-benchmarks/tpcds/data_parquet/customer_address.parquet]
PushedFilters: [In(ca_state, [IN,MS,VA]), IsNotNull(ca_address_sk)]
ReadSchema: struct<ca_address_sk:bigint,ca_state:string>
(57) Filter
Input [2]: [ca_address_sk#238L, ca_state#246]
Condition : (ca_state#246 IN (IN,VA,MS) AND isnotnull(ca_address_sk#238L))
(58) Project
Output [1]: [ca_address_sk#238L]
Input [2]: [ca_address_sk#238L, ca_state#246]
(59) ObjectHashAggregate
Input [1]: [ca_address_sk#238L]
Keys: []
Functions [1]: [partial_bloom_filter_agg(xxhash64(ca_address_sk#238L, 42),
1000000, 8388608, 0, 0)]
Aggregate Attributes [1]: [buf#928]
Results [1]: [buf#929]
(60) Exchange
Input [1]: [buf#929]
Arguments: SinglePartition, ENSURE_REQUIREMENTS, [plan_id=220]
(61) ObjectHashAggregate
Input [1]: [buf#929]
Keys: []
Functions [1]: [bloom_filter_agg(xxhash64(ca_address_sk#238L, 42), 1000000,
8388608, 0, 0)]
Aggregate Attributes [1]: [bloom_filter_agg(xxhash64(ca_address_sk#238L,
42), 1000000, 8388608, 0, 0)#923]
Results [1]: [bloom_filter_agg(xxhash64(ca_address_sk#238L, 42), 1000000,
8388608, 0, 0)#923 AS bloomFilter#924]
(62) AdaptiveSparkPlan
Output [1]: [bloomFilter#924]
Arguments: isFinalPlan=false
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]