alamb opened a new issue, #10481:
URL: https://github.com/apache/datafusion/issues/10481

   I am giving an invited keynote talk at a workshop colocated with SIGMOD 2024 
on Friday Jun 14, 2024 (after the main conference). 
   
   
   
   I need to prepare slides for this and figured people in the DataFusion 
community might be interested
   
   DataFusion: The Case for Building Data Systems using Open Standards: 
   
   Abstract: Andrew will discuss engineering tradeoffs made when building 
Apache DataFusion, an open source and extensible query engine used as the basis 
of many commercial and open source projects. These decisions (mostly) favored 
simplicity and worked better than initially expected. He will cover the 
rationale for which parts of DataFusion use pre-existing standards such as 
Arrow and Parquet, and which parts are built “from scratch” such as vectorized 
hashing and normalized sort keys. He will also discuss DataFusion’s design 
philosophy of extensible APIs paired with simple default implementations. 
Finally, he will offer lessons learned and enumerate some things that worked 
well and what could have been improved. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to