tustvold commented on issue #3464:
URL: 
https://github.com/apache/arrow-datafusion/issues/3464#issuecomment-1245255602

   I made a bit of a start on collecting some data for this. In particular I 
created something to allow generating parquet files for use in some test 
benchmarks [here](https://github.com/tustvold/access-log-gen).
   
   The basic idea was to show the performance of a selection of relatively 
simple queries across datafusion-cli and compare it to some other systems like 
duckdb, trino, polars, spark, etc... Hopefully this would provide ample 
opportunity to describe the various work that has been performed over the last 
9 or so months, and would ground the performance in easily understandable terms.
   
   We could also potentially run benchmarks with various forms of pushdown 
disabled, to quantify the impact of those changes. Or against older versions of 
the parquet reader, to quantify the performance impact of things like 
dictionary preservation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to