yjshen edited a comment on pull request #1596:
URL: 
https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1016238900


   ##  TPC-H sf=1 sort_extendedprice_discount.    [combine_and_sort method]
   
   Similar performance for the external_sort compared to the previous sort.
   ```sh
   ./target/release/tpch benchmark datafusion --path ./data --format tbl 
--query 1 --batch-size 10240 --partitions 1
   ```
   I changed q1 locally to run directly from tpch program to:
   ```sql
   select
       l_returnflag,
       l_linestatus,
       l_quantity,
       l_extendedprice,
       l_discount,
       l_tax
   from
       lineitem
   order by
       l_extendedprice,
       l_discount;
   ```
   query plan:
   ```
   SortExec: [l_extendedprice@3 ASC NULLS LAST,l_discount@4 ASC NULLS LAST]
     ProjectionExec: expr=[l_returnflag@4 as l_returnflag, l_linestatus@5 as 
l_linestatus, l_quantity@0 as l_quantity, l_extendedprice@1 as l_extendedprice, 
l_discount@2 as l_discount, l_tax@3 as l_tax]
       CsvExec: files=[./data/lineitem.tbl], has_header=false, limit=None
   ```
   
   W/ this PR:
   ```
   Running benchmarks with the following options: DataFusionBenchmarkOpt { 
query: 1, debug: false, iterations: 3, partitions: 1, batch_size: 10240, path: 
"./data", file_format: "tbl", mem_table: false }
   Query 1 iteration 0 took 7239.6 ms
   Query 1 iteration 1 took 7357.7 ms
   Query 1 iteration 2 took 6668.1 ms
   Query 1 avg time: 7088.46 ms
   ```
   
   W/o this PR:
   ```
   Running benchmarks with the following options: DataFusionBenchmarkOpt { 
query: 1, debug: false, iterations: 3, partitions: 1, batch_size: 10240, path: 
"./data", file_format: "tbl", mem_table: false }
   Query 1 iteration 0 took 7135.3 ms
   Query 1 iteration 1 took 7462.2 ms
   Query 1 iteration 2 took 7484.8 ms
   Query 1 avg time: 7360.79 ms
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to