[GitHub] [arrow-datafusion] alamb commented on issue #1329: DuckDB Comparison - Questions + Bug

GitBox Thu, 18 Nov 2021 13:27:41 -0800


alamb commented on issue #1329:
URL: 
https://github.com/apache/arrow-datafusion/issues/1329#issuecomment-973284996



   There is also a substantial list of "powered by" Arrow systems at 
https://arrow.apache.org/powered_by/
   
   Something else which may be obvious, but I wanted to make explicit, is that 
that DataFusion doesn't have its own "native" storage format in the way that 
DuckDB or other DBMS systems do -- DataFusion is a  query engine that can be 
used if you have your data in Arrow record batches (or want to load them into 
memory using `register_record_batches`).
   
   If you are comparing DuckDB and DataFusion, another comparison might be to 
start with data in parquet files and compare the timings of:
   
   1. Time to load the parquet into DuckDB + time to run the query (or time to 
run the queries in DuckDB if it supports external tables)
   2. The time needed to run the query in DataFusion directly against parquet
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #1329: DuckDB Comparison - Questions + Bug

Reply via email to