Thank you for bringing this topic up. Expanding on what you suggested, here is another about this for a vision?
DataFusion's vision is to become *the de facto query engine* of choice for new analytic applications, by leveraging the unique features of Rust and Apache Arrow to provide: 1. best-in-class query performance for a single node 2. A feature-complete declarative query interface via (most of) PostgreSQL 3. A feature-rich procedural interface for creating and running execution plans 4. High performance extensibility at at every layer The current [2] readme describes *what* Datafusion is, but does not really give a vision going forward. A few months ago we tried a "what is everyone thinking of working on" type approach [1] to create a roadmap. While that was insightful, I agree having a single unified (even if vague) goal would be very helpful I would welcome other thoughts as well: if there appears to be some consensus then we can make a PR to add the proposal to the DataFusion readme @Andy Grove <andygrov...@gmail.com> do you have any thoughts? Andrew [1] https://docs.google.com/document/d/1qspsOM_dknOxJKdGvKbC1aoVoO0M3i6x1CIo58mmN2Y/edit?userstoinvite=jonas.hansen%40airbus.com&ts=604a2a22&actionButton=1 [2] https://github.com/apache/arrow-datafusion#readme On Tue, Jun 22, 2021 at 3:18 AM Jiayu Liu <ji...@hey.com.invalid> wrote: > Hi, > > This is regarding my question about the datafusion's vision and roadmap. > > As a new contributor, I wonder what would be a vision and roadmap that > most of the contributors can/already have be aligned upon. > > Maybe due to my lack of prior context I might have missed such > discussion, or maybe this is intentionally left to be open so that > different contributors and companies can have their own features to be > compatible. But I still believe in the value of having one, and it can > somehow be shown in the README.md or contributing guideline, so that > users and the community can see what to expect from and contribute to. > > By "vision" I mean something that's necessarily vague and serving as an > overarching goal, e.g. "leveraging rust and arrow and become the most > performant SQL-compatible query engine on a single node", or "fully > compatible with (most of) PostgreSQL syntax and pluggable in most of the > web-scale analytical engines". > > I believe having this in place can help pushing the project forwards > esp. in cases of trade off, e.g. sticking to newest rust release v.s. > providing LTS, or incorporating as many features as possible (e.g. > recursive CTE? BSON support? query materializations?) v.s. keeping > binary size small and everything else into a plugin mode. >