alamb commented on issue #8373:
URL:
https://github.com/apache/arrow-datafusion/issues/8373#issuecomment-1997452950
This morning, I started working on the first page
> Add more examples / better explanation of systems built on DataFusion (we
have some good new examples I know of since -- Arroyo, Comet, and LanceDB comes
to mind)
(that looks pretty much done now to me)
> The main criticism / weakness cited is that DataFusion doesn't demonstrate
sufficient technical novelty other than integration of various existing ideas.
I think this is a very valid point, and maybe we should re-emphasize the point
more that it isn't technical novelty of any part, but the overall system.
I reworded the abstract to try and make the "not novel" point more
explicitly. Here is what I came up with:
"Apache Arrow DataFusion\cite{DataFusion} is a fast, embeddable, and
extensible query engine written in Rust\cite{Rust} that uses Apache
Arrow\cite{Arrow} as its memory model. While the individual techniques used by
DataFusion have been previously described, it differs from other industrial
strength engines by providing competitive performance \textit{and} an open
architecture that can be customized using over 10 major extension APIs. This
flexibility has led to its use in many commercial and open source databases,
machine learning pipelines, and other data-intensive systems. We anticipate
that the accessibility and versatility of DataFusion, along with its
competitive performance, will further enable the proliferation of
high-performance custom data infrastructures tailored to specific needs."
> Please move the figure out of the first page, or to the bottom of the
first page. It is distracting to read the caption of Figure 1 before the
abstract.
I personally like the visual impact of the figure at the beginning so I
would prefer keeping its location where it is. However, as the reviewer points
out, the extended caption on the figure was duplicative / repetitive with the
abstract. I thus reduced the caption to the following, which I think captures
the essence with less distraction
"When building with DataFusion, system designers implement domain-specific
features via extension APIs (blue), rather than re-implementing standard OLAP
query engine technology (green)."
I also updated the figure with the new DataFusion logo
https://github.com/apache/arrow-datafusion/issues/8788 (thanks @pinarbayata)
I think the first page is now looking quite good

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]