Re: [I] Improve performance of db-benchmark query 8 [datafusion]

via GitHub Sun, 08 Dec 2024 21:55:30 -0800


akurmustafa commented on issue #13586:
URL: https://github.com/apache/datafusion/issues/13586#issuecomment-2526996994


   I have generated the plan for the query above, which is as follows
   ```
   logical_plan
   01)SubqueryAlias: sub_query
   02)--Projection: x.id6, x.v3 AS largest2_v3
   03)----Filter: row_number() PARTITION BY [x.id6] ORDER BY [x.v3 DESC NULLS 
FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW <= UInt64(2)
   04)------WindowAggr: windowExpr=[[row_number() PARTITION BY [x.id6] ORDER BY 
[x.v3 DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]]
   05)--------Filter: x.v3 IS NOT NULL
   06)----------TableScan: x projection=[id6, v3]
   physical_plan
   01)ProjectionExec: expr=[id6@0 as id6, v3@1 as largest2_v3]
   02)--CoalesceBatchesExec: target_batch_size=8192
   03)----FilterExec: row_number() PARTITION BY [x.id6] ORDER BY [x.v3 DESC 
NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@2 <= 2, 
projection=[id6@0, v3@1]
   04)------BoundedWindowAggExec: wdw=[row_number() PARTITION BY [x.id6] ORDER 
BY [x.v3 DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: 
Ok(Field { name: "row_number() PARTITION BY [x.id6] ORDER BY [x.v3 DESC NULLS 
FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: UInt64, 
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: 
WindowFrame { units: Range, start_bound: Preceding(Int32(NULL)), end_bound: 
CurrentRow, is_causal: false }], mode=[Sorted]
   05)--------SortExec: expr=[id6@0 ASC NULLS LAST, v3@1 DESC], 
preserve_partitioning=[true]
   06)----------CoalesceBatchesExec: target_batch_size=8192
   07)------------RepartitionExec: partitioning=Hash([id6@0], 4), 
input_partitions=4
   08)--------------RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1
   09)----------------CoalesceBatchesExec: target_batch_size=8192
   10)------------------FilterExec: v3@1 IS NOT NULL
   11)--------------------MemoryExec: partitions=1, partition_sizes=[1]
   ```
   In this plan, there is no `SortPreservingMerge`. However, maybe the setting 
is different in original experiment. I have generated above plan, by default 
settings and constructing a dummy table `x` as below
   ```
   statement ok
   CREATE TABLE x (
       id6 INT,
       v3 INT
   );
   
   statement ok
   INSERT INTO x (id6, v3) VALUES
   (0, 3);
   ```
   Is there any way to see the plan of the query during benchmark run?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Improve performance of db-benchmark query 8 [datafusion]

Reply via email to