zhuqi-lucas commented on PR #18817:
URL: https://github.com/apache/datafusion/pull/18817#issuecomment-3605978882

   I got the performance result for row group reverse with topk dynamic 
pushdown:
   ```rust
        Running `/Users/zhuqi/arrow-datafusion/target/release/dfbench 
clickbench --iterations 5 --path 
/Users/zhuqi/arrow-datafusion/benchmarks/data/hits_0_sorted.parquet 
--queries-path 
/Users/zhuqi/arrow-datafusion/benchmarks/queries/clickbench/queries/sorted_data 
--sorted-by EventTime --sort-order ASC -o 
/Users/zhuqi/arrow-datafusion/benchmarks/results/issue_19059/data_sorted_clickbench.json`
   Running benchmarks with the following options: RunOpt { query: None, 
pushdown: false, common: CommonOpt { iterations: 5, partitions: None, 
batch_size: None, mem_pool_type: "fair", memory_limit: None, 
sort_spill_reservation_bytes: None, debug: false }, path: 
"/Users/zhuqi/arrow-datafusion/benchmarks/data/hits_0_sorted.parquet", 
queries_path: 
"/Users/zhuqi/arrow-datafusion/benchmarks/queries/clickbench/queries/sorted_data",
 output_path: 
Some("/Users/zhuqi/arrow-datafusion/benchmarks/results/issue_19059/data_sorted_clickbench.json"),
 sorted_by: Some("EventTime"), sort_order: "ASC" }
   ⚠️  Forcing target_partitions=1 to preserve sort order
   ⚠️  (Because we want to get the pure performance benefit of sorted data to 
compare)
   📊 Session config target_partitions: 1
   Registering table with sort order: EventTime ASC
   Executing: CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION 
'/Users/zhuqi/arrow-datafusion/benchmarks/data/hits_0_sorted.parquet' WITH 
ORDER ("EventTime" ASC)
   Q0: -- Must set for ClickBench hits_partitioned dataset. See 
https://github.com/apache/datafusion/issues/16591
   -- set datafusion.execution.parquet.binary_as_string = true
   SELECT * FROM hits ORDER BY "EventTime" DESC limit 10;
   
   Query 0 iteration 0 took 21.5 ms and returned 10 rows
   Query 0 iteration 1 took 12.7 ms and returned 10 rows
   Query 0 iteration 2 took 11.1 ms and returned 10 rows
   Query 0 iteration 3 took 10.6 ms and returned 10 rows
   Query 0 iteration 4 took 9.9 ms and returned 10 rows
   Query 0 avg time: 13.17 ms
   + set +x
   Done
   ```
   
   So it's very close to reverse parquet implementation:
   
   
   The main branch result is:
   300ms
   
   The reverse parquet implementation this PR is:
   9.8ms 
   
   The reverse row group with dynamic topk is 
https://github.com/apache/datafusion/pull/19064:
   13ms
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to