devinjdangelo commented on code in PR #457:
URL: https://github.com/apache/arrow-site/pull/457#discussion_r1444023020
##########
_posts/2024-01-25-datafusion-34.0.0.md:
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow DataFusion 34.0.0 Released, Looking Forward to 2024"
+date: "2023-06-24 00:00:00"
+author: pmc
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+## Introduction
+
+[Apache Arrow DataFusion] is an extensible query engine and database
+toolkit, written in [Rust], that uses [Apache Arrow] as its in-memory
+format. DataFusion is targeted primarily at developers creating other data
+intensive analytics.
+
+
+[apache arrow datafusion]: https://arrow.apache.org/datafusion/
+[apache arrow]: https://arrow.apache.org
+[rust]: https://www.rust-lang.org/
+
+We recently [released DataFusion 34.0.0]. This blog highlights some of the
major
+improvements since we released [DataFusion 26.0.0] -– spoiler alert it is a lot
+– and a preview of where the community plans to head in the next 6 months.
+
+[Apache Arrow DataFusion 26.0.0]:
https://arrow.apache.org/blog/2023/06/24/datafusion-25.0.0/.
+[released DataFusion 34.0.0]: https://crates.io/crates/datafusion/34.0.0
+
+This is also likely to be our last update blog post on the apache arrow site –
+future updates will likely be on our own website as we are working to [graduate
+to a top level project] (Apache Arrow DataFusion → Apache DataFusion) to help
+focus governance and help the project grow. We plan to have our [first
DataFusion in person meetup] in March 2024.
+
+
+[graduate to a top level project]:
https://github.com/apache/arrow-datafusion/discussions/6475
+[first DataFusion in person meetup]:
https://github.com/apache/arrow-datafusion/discussions/8522
+
+DataFusion is very much a community endeavor. Our core thesis is that as a
+community we can build much more advanced technology than any individual or
+company could do alone. In the last 6 months between `26.0.0` and `34.0.0`,
+community growth has been strong. We have accepted and reviewed over a
+thousand PRs from 124 different committers. XXX issues have been created and
YYY issues
+have been closed.
+
+<!--
+$ git log --pretty=oneline 26.0.0..34.0.0 . | wc -l
+ 1009
+
+$ git shortlog -sn 26.0.0..34.0.0 . | wc -l
+ 124
+-->
+
+The rest of this post highlights some of the improvements we have made
+to DataFusion over the last 6 months and a preview of where we are
+heading. You can see a list of all changes in the detailed
+[CHANGELOG].
+
+[CHANGELOG]:
https://github.com/apache/arrow-datafusion/blob/main/datafusion/CHANGELOG.md
+
+# Improved Performance
+Performance is a key feature of DataFusion and we have made major improvements
since 25.0.0
+
+(TODO get benchmark runs of TPCH and ClickBench between 25 and 34)
+
+Some key improvements we made:
+* [2-3x Better aggregation performance with many distinct groups]
+* Partially ordered grouping / streaming grouping (reduced memory)
+* Joins (what to highlight??)
+* TopK (ORDER BY LIMIT XXX)
+* Specialized min(col) GROUP BY <xxx> ORDER by min(col) LIMIT 1 type query
(TODO link / better descrioption
+* Improved join performance would maybe be another thing to highlight.
+* Improved sort order awareness: TODO link, example showing that we avoid
resorting
+
+[2-3x Better aggregation performance with many distinct groups]:
https://arrow.apache.org/blog/2023/08/05/datafusion_fast_grouping/
+
+
+# New Features
+
+## DML / Insert / Creating Files
+
+DataFusion now supports writing, in parallel to Parquet, CSV, JSON (ARROW?) and
+soon AVRO. This includes writing in parallel to individual and multiple files
+
+You can do this via `CREATE EXTERNAL TABLE`, for example:
+
+```sql
+(TODO example)
+
+
+```
+
+As well as the COPY command (modeled after DuckDB’s copy (TODO duckdb copy
blog link)
Review Comment:
I am not aware of a blog link specifically, but I have frequently referenced
DuckDB's documentation page of copy statements here:
https://duckdb.org/docs/sql/statements/copy.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]