alamb commented on code in PR #6:
URL: https://github.com/apache/datafusion-site/pull/6#discussion_r1686870959


##########
_posts/2024-07-09-datafusion-40.0.0.md:
##########
@@ -0,0 +1,450 @@
+---
+layout: post
+title: "Apache Arrow DataFusion 40.0.0 Released"
+date: "2024-07-09 00:00:00"
+author: alamb
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/9602 for details -->
+
+## Introduction
+
+We recently [released DataFusion 40.0.0]. This blog highlights some of the many
+major improvements since we [released DataFusion 34.0.0]
+and a preview of where the community is thinking about improving in the next 6 
months.
+
+[released DataFusion 34.0.0]: 
https://datafusion.apache.org/blog/2024/01/19/datafusion-34.0.0/
+[released DataFusion 40.0.0]: https://crates.io/crates/datafusion/40.0.0
+
+<!-- todo update this intro --> 
+[Apache Arrow DataFusion] is an extensible query engine, written in [Rust], 
that
+uses [Apache Arrow] as its in-memory format. DataFusion is used by developers 
to
+create new, fast data centric systems such as databases, dataframe libraries,
+machine learning and streaming applications. While [DataFusion’s primary design
+goal] is to accelerate creating other data centric systems, it has a
+reasonable experience directly out of the box as a [dataframe library] and
+[command line SQL tool].
+
+[DataFusion’s primary design goal]: 
https://arrow.apache.org/datafusion/user-guide/introduction.html#project-goals
+[dataframe library]: https://arrow.apache.org/datafusion-python/
+[command line SQL tool]: 
https://arrow.apache.org/datafusion/user-guide/cli.html
+
+
+[apache arrow datafusion]: https://datafusion.apache.org/
+[apache arrow]: https://arrow.apache.org
+[rust]: https://www.rust-lang.org/
+
+DataFusion's core thesis is that as a community together, we can build much 
more
+advanced technology than any of us as individuals or companies could do alone. 
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions. 
+With DataFusion, we can all build on top of a shared foundation, and focus on
+what makes our projects unique.
+
+
+
+
+# Community Growth  📈 
+
+In the last 6 months, between `34.0.0` and `40.0.0`, our community continues to
+grow in new ane exciting ways.
+
+1. DataFusion became a top level Apache Software Foundation project (read the
+   [press release] and [blog post]).
+2. We added several PMC members and new
+   committers [@comphead], [@mustafasrepo], [@ozankabak] joined the PMC,
+   [@jonahgao] and [@lewiszlw] joined as a committer. See the [mailing list] 
for
+   more details.
+3. [DataFusion Comet] was [donated] and is nearing its first release.
+4. In the [core DataFusion repo] alone we reviewed and accepted almost 1500 
PRs from 182 different
+   committers, created over 1000 issues and closed 781 of them 🚀. This is up 
from
+   1000 PRs from 124 committers with 650 issues created in our last post 🤯. You
+   can find a list of all changes in the detailed [CHANGELOG].
+5. DataFusion meetups in multiple cities around the world: [Austin], [San 
Francisco], 
+   [Hangzhou], [New York], and [Belgrade].
+6. Many new projects in the [datafusion-contrib] organization, including
+   [Table Providers], [SQL Lancer], [Open Variant], [JSON], and [ORC].  
+
+[core DataFusion repo]: https://github.com/apache/arrow-datafusion
+[CHANGELOG]: 
https://github.com/apache/datafusion/blob/main/datafusion/CHANGELOG.md
+[press release]: 
https://news.apache.org/foundation/entry/apache-software-foundation-announces-new-top-level-project-apache-datafusion
+[blog post]: https://datafusion.apache.org/blog/2024/05/07/datafusion-tlp/
+[@comphead]: https://github.com/comphead
+[@mustafasrepo]: https://github.com/mustafasrepo
+[@ozankabak]: https://github.com/ozankabak
+[@jonahgao]: https://github.com/jonahgao
+[@lewiszlw]: https://github.com/lewiszlw
+[mailing list]: https://lists.apache.org/list.html?d...@datafusion.apache.org
+[Austin]: https://github.com/apache/datafusion/discussions/8522
+[San Francisco]: https://github.com/apache/datafusion/discussions/10800
+[Hangzhou]: https://www.huodongxing.com/event/5761971909400?td=1965290734055
+[New York]: https://github.com/apache/datafusion/discussions/11213
+[Belgrade]: https://github.com/apache/datafusion/discussions/11431
+
+[datafusion-contrib]: https://github.com/datafusion-contrib
+[Table Providers]: 
https://github.com/datafusion-contrib/datafusion-table-providers
+[SQL Lancer]: https://github.com/datafusion-contrib/datafusion-sqllancer
+[Open Variant]: 
https://github.com/datafusion-contrib/datafusion-functions-variant
+[JSON]: https://github.com/datafusion-contrib/datafusion-functions-json
+[ORC]: https://github.com/datafusion-contrib/datafusion-orc
+
+<!--
+$ git log --pretty=oneline 34.0.0..40.0.0 . | wc -l
+     1453 (up from 1009)
+
+$ git shortlog -sn 34.0.0..40.0.0 . | wc -l
+      182 (up from 124)
+
+
+https://crates.io/crates/datafusion/34.0.0
+DataFusion 34 released Dec 17, 2023
+
+https://crates.io/crates/datafusion/40.0.0
+DataFusion 34 released July 12, 2024
+
+Issues created in this time: 321 open, 781 closed (up from 214 open, 437 
closed)
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+created%3A2023-12-17..2024-07-12
+
+Issues closed: 911 (up from 517)
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2023-12-17..2024-07-12
+
+PRs merged in this time 1490 (up from 908)
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2023-12-17..2024-07-12
+
+-->
+
+
+In addition, DataFusion has been appearing in more and more writing, both 
online and offline. Here are some highlights:
+
+1. [Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query 
Engine], was presented in [SIGMOD '24], one of the major database conferences
+2. DataFusion described as part of the trend to define "the POSIX of 
databases" in ["What Goes Around Comes Around... And Around...] from Andy Pavlo 
and Mike Stonebraker
+3. ["Why you should keep an eye on Apache DataFusion and its community"]
+4. [Apache DataFusion offline meetup in the Bay Area]
+
+
+[DataFusion Comet]: https://datafusion.apache.org/comet/
+[donated]: https://arrow.apache.org/blog/2024/03/06/comet-donation/
+[SIGMOD '24]: https://2024.sigmod.org/
+
+
+[Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine]: 
https://dl.acm.org/doi/10.1145/3626246.3653368
+["What Goes Around Comes Around... And Around...]: 
https://db.cs.cmu.edu/papers/2024/whatgoesaround-sigmodrec2024.pdf
+["Why you should keep an eye on Apache DataFusion and its community"]: 
https://www.cpard.xyz/posts/datafusion/
+[Apache DataFusion offline meetup in the Bay Area]: 
https://www.tisonkun.org/2024/07/15/datafusion-meetup-san-francisco/
+
+
+# Improved Performance 🚀 
+
+Performance is a key feature of DataFusion, and the community continues to work
+to keep DataFusion state of the art in this area. One major area DataFusion
+improved is the time it takes to convert a SQL query into a plan that can be
+executed. Planning is now almost 2x faster for TPC-DS and TPC-H queries, and
+over 10x faster for some queries with many columns.
+
+Here is a chart showing the improvement due to the concerted effort of
+many contributors (TODO list contributors by name) over several months (see
+[ticket] for more details)
+
+<img src="{{ site.baseurl 
}}/assets/datafusion-40.0.0/improved-planning-time.png" width="700">
+
+[ticket]: https://github.com/apache/datafusion/issues/9637
+
+Also, we implemented [specialization for single 
Uft8/LargeUtf8/Binary/LargeBinary]
+group by columns which resulted in a 40% performance improvement for some
+benchmarks.
+
+[specialization for single Uft8/LargeUtf8/Binary/LargeBinary]: 
https://github.com/apache/datafusion/pull/8827
+
+We are also in the final phases of our initial integration of the new [Arrow
+StringView] which will provide a significant performance improvement
+for many workloads. This feature should be available in future versions of 
DataFusion.
+Kudos to [@XiangpengHong], [@PsiACE], [@Weigju>XXXX] and
+[@AriesDevil], and [@alamb] for driving this along.
+
+[Arrow StringView]: 
https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html
+
+
+# Improved Quality 📋
+
+DataFusion continues to improve overall in quality. One of the most exciting
+improvements is the addition of a new [SQLancer] based [DataFusion Fuzzing]
+suite thanks to [@2010YOUY01] that has already found several bugs (kudos to 
[@jonahgao],
+YYY, and ZZZ for fixing them so fast).
+
+[SQLancer]: https://github.com/apache/datafusion/issues/11030
+[DataFusion Fuzzing]: 
https://github.com/datafusion-contrib/datafusion-sqllancer
+[@2010YOUY01]: https://github.com/2010YOUY01
+
+
+## Improved Documentation 📚
+
+We continue to improve the documentation to make it easier to get started 
using DataFusion with
+the [Library Users Guide], [API documentation], and [Examples].
+
+Some notable new examples include:
+* [sql_analysis.rs] to analyse SQL queries with DataFusion structures (thanks 
[@LorrensP-2158466])
+* [plan_to_sql.rs] to generate SQL from DataFusion Expr and LogicalPlan 
(thanks [@edmondop])
+
+[Library Users Guide]: 
https://datafusion.apache.org/library-user-guide/index.html
+[API documentation]: https://docs.rs/datafusion/latest/datafusion/index.html
+[Examples]: https://github.com/apache/datafusion/tree/main/datafusion-examples
+[sql_analysis.rs]: 
https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/sql_analysis.rs
+[plan_to_sql.rs]: 
https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/plan_to_sql.rs
+[@LorrensP-2158466]: https://github.com/LorrensP-2158466
+[@edmondop]: https://github.com/edmondop
+
+# New Features ✨
+
+There are too many new features in the last 6 months to list them all, but here
+are some highlights:
+
+## SQL 
+* Support for unnest (TODO LINK)
+* Support  Recursive CTEs https://github.com/apache/datafusion/pull/9619 / 
https://github.com/apache/datafusion/issues/462

Review Comment:
   fixex



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to