[I] Write a wikipedia article for Apache DataFusion [datafusion]

via GitHub Fri, 20 Mar 2026 04:33:03 -0700


alamb opened a new issue, #21076:
URL: https://github.com/apache/datafusion/issues/21076


   ### Is your feature request related to a problem or challenge?
   
   A Wikipedia article would be useful for Apache DataFusion to make the 
project easier to discover, easier to explain, and easier to cite from a 
neutral source. 
   
   The main benefit is not “marketing copy”; it is legitimacy and 
referenceability.
   
   This is even more important these days when Wikipedia is a core training 
corpus for LLMs and search engine results
   
     - It gives newcomers a neutral landing page distinct from 
https://datafusion.apache,org,
     - It makes the project easier for journalists, analysts, conference 
organizers, students, and procurement people to cite quickly.
     - It strengthens search visibility and entity recognition. In practice 
Wikipedia pages often feed search summaries, knowledge panels, mirrors, and LLM 
retrieval.
     - It signals that the project is notable beyond its own community because 
the article must be supported by independent reliable sources.
     - It gives a durable place to document ecosystem facts like history, 
governance, and adoption that do not fit cleanly into product docs.
   
   
   ### Describe the solution you'd like
   
   I would like a neutral wikipedia page for Apache DataFusion
   
   Here are some similar pages
   - https://en.wikipedia.org/wiki/DuckDB
   - https://en.wikipedia.org/wiki/Apache_Spark
   - https://en.wikipedia.org/wiki/Polars_(software)
   
   DuckDB’s page shows the pattern clearly: a short neutral definition, 
history, architecture, language bindings, commercial use, and 
foundation/governance in one place, with references to papers and third-party 
coverage
   
   
   
   
   
   ### Describe alternatives you've considered
   
   I think a strong article will include many citations. Here are a bunch I 
found with the help of codex
   
   Some third-party citations that are probably useful for this article:
   
   - A standalone Apache top-level project as of April 16, 2024, announced 
publicly by the Apache Arrow PMC and ASF (Apache Arrow blog 
(https://arrow.apache.org/blog/2024/05/07/datafusion-tlp/), ASF announcement 
(https://news.apache.org/foundation/entry/apache-software-foundation-announces-new-top-level-project-apache-datafusion)).
 
   
   SIGMOD 2024 technical paper
   
     - It appears in the SIGMOD 2024 program as an accepted industry-track 
paper: SIGMOD accepted papers
       (https://2024.sigmod.org/industrial-list.shtml), SIGMOD session listing 
(https://2024.sigmod.org/program_sigmod.shtml).
     - The DOI is 10.1145/3626246.3653368 
(https://dl.acm.org/doi/10.1145/3626246.3653368).
   
   Citations for technical importance
      - crates.io: 17,668,287 all time downloads 
(https://crates.io/crates/datafusion)
   
     - CRN: “The 10 Coolest Open-Source Software Tools Of 2024”
       
(https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3)
       It explicitly includes Apache DataFusion and describes it as a fast 
extensible query engine, notes
       its Rust/Arrow basis, and mentions its 2024 top-level-project milestone. 
This is a strongest source on that page for general
       notability.
   
     - Datanami: “How the FDAP Stack Gives InfluxDB 3.0 Real-Time Speed, 
Efficiency”
       
(https://www.datanami.com/2024/03/15/how-the-fdap-stack-gives-influxdb-3-0-real-time-speed-efficiency/)
       This quotes Paul Dix saying DataFusion had matured substantially and had 
best-in-class performance on a number of queries versus other
       columnar query engines. It is not a ranking article, but it is 
meaningful third-party validation of technical importance.
   
   
   Third-party citations for usage in products
   
     - SiliconANGLE: “Enterprise DB begins rolling AI features into PostgreSQL”
       
(https://siliconangle.com/2024/05/23/enterprise-db-begins-rolling-ai-features-postgresql/)
       Independent coverage stating EDB combined Apache DataFusion, Arrow, and 
Delta Lake in its analytics/lakehouse capability.
   
     - Spice AI: “How we use Apache DataFusion at Spice AI” 
(https://spice.ai/blog/how-we-use-apache-datafusion-at-spice-ai)
       This says Spice uses DataFusion as its SQL query engine and extends it 
with custom TableProviders, optimizer rules, and UDFs for
       federated SQL workloads.
   
     - Cloudflare Log Explorer GA announcement 
(https://blog.cloudflare.com/logexplorer-ga/) from June 10, 2025.
       Queriers fetch matching files from R2 and “process SQL queries using 
Apache DataFusion.”
   
     - InfluxData: “Flight, DataFusion, Arrow, and Parquet: Using the FDAP 
Architecture to build InfluxDB 3.0”
       
(https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/)
       Clearly states InfluxDB 3.0 chose DataFusion as its query engine 
foundation and explains why.
   
     - Pydantic Logfire issue: “We’re changing database” 
(https://github.com/pydantic/logfire/issues/408)
       Usable as a primary source for adoption only. It says Logfire is moving 
from Timescale to a custom database built on DataFusion and
       gives reasons. 
     
     - Palantir Foundry announcements for July 2025 
(https://www.palantir.com/docs/foundry/announcements/2025-07)
       This says lightweight pipelines are “powered by DataFusion,” 
   
     - Cube: “Query pushdown in Cube’s semantic layer” 
(https://cube.dev/blog/query-push-down-in-cubes-semantic-layer)
       Good third-party primary source for “used in production by Cube” and for 
describing how Cube uses DataFusion internally.
     
     - Kamu: “100X faster ingestion, and FlightSQL support for connecting BI 
tools” (https://www.kamu.dev/blog/2023-09-datafusion-flightsql/)
       Good third-party primary source for ecosystem adoption. It explicitly 
says Kamu added support for Apache DataFusion and reports
       performance claims in its own product.
     
     - LanceDB: “Columnar File Readers in Depth: APIs and Fusion” 
(https://lancedb.com/blog/columnar-file-readers-in-depth-apis-and-fusion/)
       Usable for ecosystem context. It says Lance uses DataFusion extensively 
and demonstrates integration with it.
   
   - Bauplan Labs: “Duck Hunt: moving Bauplan from DuckDB to DataFusion”
       
(https://www.bauplanlabs.com/post/duck-hunt-moving-bauplan-from-duckdb-to-datafusion)
       Bauplan explains the migration as driven by DataFusion’s Arrow-first 
architecture, extensibility, and community-driven development.
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Write a wikipedia article for Apache DataFusion [datafusion]

Reply via email to