[GitHub] [arrow-datafusion] alamb commented on issue #6782: Write DataFusion paper for (SIGMOD / VLDB)

via GitHub Tue, 11 Jul 2023 02:50:26 -0700


alamb commented on issue #6782:
URL: 
https://github.com/apache/arrow-datafusion/issues/6782#issuecomment-1630516792


   Some ideas about the paper:
   
   # Thesis:
   
   We demonstrate it is possible to get DuckDB like performance using standards 
like Parquet and Arrow as the internal interchange format, both inside of and 
outside of the engine. Previously the conventional wisdom has been that such 
performance levels require a tightly integrated engine where the disk format, 
in memory layout, and processing engine are  engineered in tandem to work well 
together. 
   
   While the engineering effort required for such an engine is large, it is 
possible by leveraging the open source model and Apache governance model to 
poll resources amongst users. Given the availablity of fast, standards based, 
interoperable vectorized engines like DataFusion, we predict a Cambrian 
explosion of new analytic systems which would not have been possible before if 
they had to create their own engines
    
   
   # Compare / Contrast Similar systems:
   Velox (focuses on the execution engine side)
   Apache Calcite (focuses on sql and frontend)
   DataFusion has all the pieces of the toolkit (sql frontend, logical plan, 
and execution plans)
   
   Also, Rust
   
   
   
   Internally DataFusion uses Arrow as the interchange between operators, 
though internally different, non standard formats are used (such as the Arrow 
Row Format)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #6782: Write DataFusion paper for (SIGMOD / VLDB)

Reply via email to