alamb commented on issue #8373:
URL: 
https://github.com/apache/arrow-datafusion/issues/8373#issuecomment-2016612420

   I took another pass through the paper. In addition to some word smithing and 
whitespace engineering, I increased the size of the abstract both so the front 
page didn't look as empty but also to summarize the content of the paper (in 
addition to its conclusion / main point) to help readers decide if the paper 
was interesting to them
   
   Here is the current text
   
   > Apache Arrow DataFusion\cite{DataFusion} is a fast, embeddable, and 
extensible query engine written in Rust\cite{Rust} that uses Apache 
Arrow\cite{Arrow} as its memory model. In this paper we describe the 
technologies on which it is built, and how it fits in long term database 
implementation trends. We then enumerate the features of a modern OLAP engine, 
and outline optimizations required for high performance. Next we describe 
DataFusion's architecture and extension APIs to illustrate the interfaces used 
in modular query engines to integrate with the systems built on them. Finally, 
we demonstrate open standards and extensible design do not preclude 
state-of-the-art performance using a series of experimental comparisons to 
DuckDB\cite{DuckDB}. 
   >
   > While the individual techniques used in DataFusion have been previously 
described many times, it differs from other industrial strength engines by 
providing competitive performance \textit{and} an open architecture that can be 
customized using more than 10 major extension APIs. This flexibility has led to 
use in many commercial and open source databases, machine learning pipelines, 
and other data-intensive systems. We anticipate that the accessibility and 
versatility of DataFusion, along with its competitive performance, will further 
the proliferation of high-performance custom data infrastructures tailored to 
specific needs assembled from modular components\cite{ComposableManifesto, 
ComposableCodex}.
   
   Here is what it looks like
   
   ![Screenshot 2024-03-23 at 5 51 20 
PM](https://github.com/apache/arrow-datafusion/assets/490673/d243410f-b79a-4a12-9a3f-dac2eb799391)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to