alamb commented on a change in pull request #193:
URL: https://github.com/apache/arrow-site/pull/193#discussion_r806271254
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
Review comment:
```suggestion
DataFusion's SQL, `DataFrame`, and manual `PlanBuilder` API let users
access to a sophisticated query optimizer and execution engine capable of fast,
resource efficient parallel execution that takes optimal advantage of todays
multicore hardware. Being written in Rust means DataFusion an offer *both* the
safety of dynamic languages as well as the resource efficiency of a compiled
language.
```
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
+
+- DataFusion Crate
+ - The DataFusion crate is in the process of being split into multiple crates
in order to decrease compilation times and improve the development experience.
To start, datafusion-common (the core DataFusion components) and
datafusion-expr (DataFusion expressions, functions, and operators) will be
split out. There will be additional splits after the 7.0 release.
Review comment:
```suggestion
- The DataFusion crate is being split into multiple crates to decrease
compilation times and improve the development experience. Initially,
`datafusion-common` (the core DataFusion components) and `datafusion-expr`
(DataFusion expressions, functions, and operators) have been split out. There
will be additional splits after the 7.0 release.
```
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
+
+- DataFusion Crate
+ - The DataFusion crate is in the process of being split into multiple crates
in order to decrease compilation times and improve the development experience.
To start, datafusion-common (the core DataFusion components) and
datafusion-expr (DataFusion expressions, functions, and operators) will be
split out. There will be additional splits after the 7.0 release.
+- Performance Improvements and Optimizations
+ - Arrow’s dyn scalar kernels are now used which enable more efficient
operations on DictionaryArrays
[#1685](https://github.com/apache/arrow-datafusion/pull/1685)
+ - Switch from std::sync::Mutex to parking_lot::Mutex
[#1720](https://github.com/apache/arrow-datafusion/pull/1720)
+- New Features
+ - Better support for limiting resource usage
+ - MemoryMananger and DiskManager
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - Out of core sort
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - New metrics
+ - `Gauge` and `CurrentMemoryUsage`
[#1682](https://github.com/apache/arrow-datafusion/pull/1682)
+ - `Spill_count` and `spilled_bytes`
[#1641](https://github.com/apache/arrow-datafusion/pull/1641)
+ - New math functions
+ - `Approx_quantile`
[#1529](https://github.com/apache/arrow-datafusion/pull/1539)
+ - `stddev` and `variance` (sample and population)
[#1525](https://github.com/apache/arrow-datafusion/pull/1525)
+ - `corr` [#1561](https://github.com/apache/arrow-datafusion/pull/1561)
+ - Support decimal type
[#1394](https://github.com/apache/arrow-datafusion/pull/1394)[#1407](https://github.com/apache/arrow-datafusion/pull/1407)[#1408](https://github.com/apache/arrow-datafusion/pull/1408)[#1431](https://github.com/apache/arrow-datafusion/pull/1431)[#1483](https://github.com/apache/arrow-datafusion/pull/1483)[#1554](https://github.com/apache/arrow-datafusion/pull/1554)[#1640](https://github.com/apache/arrow-datafusion/pull/1640)
+ - Support for evolved schemas
[#1622](https://github.com/apache/arrow-datafusion/pull/1622)[#1709](https://github.com/apache/arrow-datafusion/pull/1709)
Review comment:
```suggestion
- Support for reading parquet files with evolved schemas
[#1622](https://github.com/apache/arrow-datafusion/pull/1622)[#1709](https://github.com/apache/arrow-datafusion/pull/1709)
```
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
+
+- DataFusion Crate
+ - The DataFusion crate is in the process of being split into multiple crates
in order to decrease compilation times and improve the development experience.
To start, datafusion-common (the core DataFusion components) and
datafusion-expr (DataFusion expressions, functions, and operators) will be
split out. There will be additional splits after the 7.0 release.
+- Performance Improvements and Optimizations
+ - Arrow’s dyn scalar kernels are now used which enable more efficient
operations on DictionaryArrays
[#1685](https://github.com/apache/arrow-datafusion/pull/1685)
+ - Switch from std::sync::Mutex to parking_lot::Mutex
[#1720](https://github.com/apache/arrow-datafusion/pull/1720)
+- New Features
+ - Better support for limiting resource usage
+ - MemoryMananger and DiskManager
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - Out of core sort
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - New metrics
+ - `Gauge` and `CurrentMemoryUsage`
[#1682](https://github.com/apache/arrow-datafusion/pull/1682)
+ - `Spill_count` and `spilled_bytes`
[#1641](https://github.com/apache/arrow-datafusion/pull/1641)
+ - New math functions
+ - `Approx_quantile`
[#1529](https://github.com/apache/arrow-datafusion/pull/1539)
+ - `stddev` and `variance` (sample and population)
[#1525](https://github.com/apache/arrow-datafusion/pull/1525)
+ - `corr` [#1561](https://github.com/apache/arrow-datafusion/pull/1561)
+ - Support decimal type
[#1394](https://github.com/apache/arrow-datafusion/pull/1394)[#1407](https://github.com/apache/arrow-datafusion/pull/1407)[#1408](https://github.com/apache/arrow-datafusion/pull/1408)[#1431](https://github.com/apache/arrow-datafusion/pull/1431)[#1483](https://github.com/apache/arrow-datafusion/pull/1483)[#1554](https://github.com/apache/arrow-datafusion/pull/1554)[#1640](https://github.com/apache/arrow-datafusion/pull/1640)
+ - Support for evolved schemas
[#1622](https://github.com/apache/arrow-datafusion/pull/1622)[#1709](https://github.com/apache/arrow-datafusion/pull/1709)
+ - Support for registering `DataFrame` as table
[#1699](https://github.com/apache/arrow-datafusion/pull/1699)
+ - Suppot `substring` function
[#1621](https://github.com/apache/arrow-datafusion/pull/1621)
+ - Support `array_agg(distinct ...)`
[#1579](https://github.com/apache/arrow-datafusion/pull/1579)
+ - Support `sort` on unprojected columns
[#1415](https://github.com/apache/arrow-datafusion/pull/1415)
+- Additional Integration Points
+ - A new public Expression simplification API
[#1717](https://github.com/apache/arrow-datafusion/pull/1717)
+- [DataFusion-Contrib](https://github.com/datafusion-contrib)
+ - A new GitHub organization created as a home for both `DataFusion`
extensions and as a testing ground for new features.
+ - Extensions
+ -
[DataFusion-Python](https://github.com/datafusion-contrib/datafusion-python)
+ -
[DataFusion-Java](https://github.com/datafusion-contrib/datafusion-java)
+ -
[DataFusion-hdsfs-native](https://github.com/datafusion-contrib/datafusion-hdfs-native)
+ -
[DataFusion-ObjectStore-s3](https://github.com/datafusion-contrib/datafusion-objectstore-s3)
+ - New Features
+ -
[DataFusion-Streams](https://github.com/datafusion-contrib/datafusion-streams)
+- [Arrow2](https://github.com/jorgecarleitao/arrow2)
+ - An [Arrow2 Branch](https://github.com/apache/arrow-datafusion/tree/arrow2)
has been created. There are ongoing discussions in
[DataFusion](https://github.com/apache/arrow-datafusion/issues/1532) and
[arrow-rs](https://github.com/apache/arrow-rs/issues/1176) about migrating
`DataFusion` to `Arrow2`
+
+For the full list of new features with their relevant PRs, see the
+[enhancements
section](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md)
+in the changelog.
+
+# Documentation and Roadmap
+
+The project's documentation is being consolidated into the official site. You
can find more details there on topics such as the SQL status (TO DO LINK) and a
user guide.
+
+To provide transparency on DataFusion’s priorities to users and developers a
three month roadmap will be published at the beginning of each quarter. This
can be found here (TO DO LINK once site is updated).
+
+See full details on DataFusion’s ambitions (TO DO LINK).
+
+# Upcoming Attractions
+
+- Ballista is gaining momentum, and several groups are now evaluating and
contributing to the project.
+ - Some of the proposed improvements
+ - [Improvements
Overview](https://github.com/apache/arrow-datafusion/issues/1701)
+ - [Extensibility](https://github.com/apache/arrow-datafusion/issues/1675)
+ - [File system
access](https://github.com/apache/arrow-datafusion/issues/1702)
+ - [Cluster state](https://github.com/apache/arrow-datafusion/issues/1704)
+- Continued improvements for working with limited resources and large datasets
+ - Memory limited
joins[#1599](https://github.com/apache/arrow-datafusion/issues/1599)
+ - Sort-merge
join[#141](https://github.com/apache/arrow-datafusion/issues/141)[#1776](https://github.com/apache/arrow-datafusion/pull/1776)
+ - Introduce row based bytes representation
[#1708](https://github.com/apache/arrow-datafusion/pull/1708)
+
+# How to Get Involved
+
+If you are interested in contributing to DataFusion, we would love to have
you! You
+can help by trying out DataFusion on some of your own data and projects and
filing bug reports and helping to
+improve the documentation, or contribute to the documentation, tests or code.
A list of open issues suitable for
+beginners is
[here](https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
Review comment:
```suggestion
can help by trying out DataFusion on some of your own data and projects and
let us know how it goes or contribute a PR with documentation, tests or code. A
list of open issues suitable for
beginners is
[here](https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
```
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
+
+- DataFusion Crate
+ - The DataFusion crate is in the process of being split into multiple crates
in order to decrease compilation times and improve the development experience.
To start, datafusion-common (the core DataFusion components) and
datafusion-expr (DataFusion expressions, functions, and operators) will be
split out. There will be additional splits after the 7.0 release.
+- Performance Improvements and Optimizations
+ - Arrow’s dyn scalar kernels are now used which enable more efficient
operations on DictionaryArrays
[#1685](https://github.com/apache/arrow-datafusion/pull/1685)
+ - Switch from std::sync::Mutex to parking_lot::Mutex
[#1720](https://github.com/apache/arrow-datafusion/pull/1720)
+- New Features
+ - Better support for limiting resource usage
+ - MemoryMananger and DiskManager
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - Out of core sort
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - New metrics
+ - `Gauge` and `CurrentMemoryUsage`
[#1682](https://github.com/apache/arrow-datafusion/pull/1682)
+ - `Spill_count` and `spilled_bytes`
[#1641](https://github.com/apache/arrow-datafusion/pull/1641)
+ - New math functions
+ - `Approx_quantile`
[#1529](https://github.com/apache/arrow-datafusion/pull/1539)
+ - `stddev` and `variance` (sample and population)
[#1525](https://github.com/apache/arrow-datafusion/pull/1525)
+ - `corr` [#1561](https://github.com/apache/arrow-datafusion/pull/1561)
+ - Support decimal type
[#1394](https://github.com/apache/arrow-datafusion/pull/1394)[#1407](https://github.com/apache/arrow-datafusion/pull/1407)[#1408](https://github.com/apache/arrow-datafusion/pull/1408)[#1431](https://github.com/apache/arrow-datafusion/pull/1431)[#1483](https://github.com/apache/arrow-datafusion/pull/1483)[#1554](https://github.com/apache/arrow-datafusion/pull/1554)[#1640](https://github.com/apache/arrow-datafusion/pull/1640)
+ - Support for evolved schemas
[#1622](https://github.com/apache/arrow-datafusion/pull/1622)[#1709](https://github.com/apache/arrow-datafusion/pull/1709)
+ - Support for registering `DataFrame` as table
[#1699](https://github.com/apache/arrow-datafusion/pull/1699)
+ - Suppot `substring` function
[#1621](https://github.com/apache/arrow-datafusion/pull/1621)
+ - Support `array_agg(distinct ...)`
[#1579](https://github.com/apache/arrow-datafusion/pull/1579)
+ - Support `sort` on unprojected columns
[#1415](https://github.com/apache/arrow-datafusion/pull/1415)
+- Additional Integration Points
+ - A new public Expression simplification API
[#1717](https://github.com/apache/arrow-datafusion/pull/1717)
+- [DataFusion-Contrib](https://github.com/datafusion-contrib)
+ - A new GitHub organization created as a home for both `DataFusion`
extensions and as a testing ground for new features.
+ - Extensions
+ -
[DataFusion-Python](https://github.com/datafusion-contrib/datafusion-python)
+ -
[DataFusion-Java](https://github.com/datafusion-contrib/datafusion-java)
+ -
[DataFusion-hdsfs-native](https://github.com/datafusion-contrib/datafusion-hdfs-native)
+ -
[DataFusion-ObjectStore-s3](https://github.com/datafusion-contrib/datafusion-objectstore-s3)
+ - New Features
+ -
[DataFusion-Streams](https://github.com/datafusion-contrib/datafusion-streams)
+- [Arrow2](https://github.com/jorgecarleitao/arrow2)
+ - An [Arrow2 Branch](https://github.com/apache/arrow-datafusion/tree/arrow2)
has been created. There are ongoing discussions in
[DataFusion](https://github.com/apache/arrow-datafusion/issues/1532) and
[arrow-rs](https://github.com/apache/arrow-rs/issues/1176) about migrating
`DataFusion` to `Arrow2`
+
+For the full list of new features with their relevant PRs, see the
+[enhancements
section](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md)
+in the changelog.
+
+# Documentation and Roadmap
+
+The project's documentation is being consolidated into the official site. You
can find more details there on topics such as the SQL status (TO DO LINK) and a
user guide.
+
+To provide transparency on DataFusion’s priorities to users and developers a
three month roadmap will be published at the beginning of each quarter. This
can be found here (TO DO LINK once site is updated).
+
+See full details on DataFusion’s ambitions (TO DO LINK).
+
+# Upcoming Attractions
+
+- Ballista is gaining momentum, and several groups are now evaluating and
contributing to the project.
+ - Some of the proposed improvements
+ - [Improvements
Overview](https://github.com/apache/arrow-datafusion/issues/1701)
+ - [Extensibility](https://github.com/apache/arrow-datafusion/issues/1675)
+ - [File system
access](https://github.com/apache/arrow-datafusion/issues/1702)
+ - [Cluster state](https://github.com/apache/arrow-datafusion/issues/1704)
+- Continued improvements for working with limited resources and large datasets
+ - Memory limited
joins[#1599](https://github.com/apache/arrow-datafusion/issues/1599)
+ - Sort-merge
join[#141](https://github.com/apache/arrow-datafusion/issues/141)[#1776](https://github.com/apache/arrow-datafusion/pull/1776)
+ - Introduce row based bytes representation
[#1708](https://github.com/apache/arrow-datafusion/pull/1708)
+
+# How to Get Involved
+
+If you are interested in contributing to DataFusion, we would love to have
you! You
Review comment:
```suggestion
If you are interested in contributing to DataFusion, and learning about
state of
the art query processing, we would love to have you join us on the journey!
You
```
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
Review comment:
```suggestion
```
I am not sure this sentence adds much as a very similar thought is mentioned
immediately above it.
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
Review comment:
```suggestion
The following section highlights some of the improvements in this release.
Of course, many other bug fixes
and improvements have also been made and we refer you to the complete
[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md)
for the full detail.
```
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
+
+- DataFusion Crate
+ - The DataFusion crate is in the process of being split into multiple crates
in order to decrease compilation times and improve the development experience.
To start, datafusion-common (the core DataFusion components) and
datafusion-expr (DataFusion expressions, functions, and operators) will be
split out. There will be additional splits after the 7.0 release.
+- Performance Improvements and Optimizations
+ - Arrow’s dyn scalar kernels are now used which enable more efficient
operations on DictionaryArrays
[#1685](https://github.com/apache/arrow-datafusion/pull/1685)
+ - Switch from std::sync::Mutex to parking_lot::Mutex
[#1720](https://github.com/apache/arrow-datafusion/pull/1720)
+- New Features
+ - Better support for limiting resource usage
+ - MemoryMananger and DiskManager
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - Out of core sort
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - New metrics
+ - `Gauge` and `CurrentMemoryUsage`
[#1682](https://github.com/apache/arrow-datafusion/pull/1682)
+ - `Spill_count` and `spilled_bytes`
[#1641](https://github.com/apache/arrow-datafusion/pull/1641)
+ - New math functions
+ - `Approx_quantile`
[#1529](https://github.com/apache/arrow-datafusion/pull/1539)
+ - `stddev` and `variance` (sample and population)
[#1525](https://github.com/apache/arrow-datafusion/pull/1525)
+ - `corr` [#1561](https://github.com/apache/arrow-datafusion/pull/1561)
+ - Support decimal type
[#1394](https://github.com/apache/arrow-datafusion/pull/1394)[#1407](https://github.com/apache/arrow-datafusion/pull/1407)[#1408](https://github.com/apache/arrow-datafusion/pull/1408)[#1431](https://github.com/apache/arrow-datafusion/pull/1431)[#1483](https://github.com/apache/arrow-datafusion/pull/1483)[#1554](https://github.com/apache/arrow-datafusion/pull/1554)[#1640](https://github.com/apache/arrow-datafusion/pull/1640)
+ - Support for evolved schemas
[#1622](https://github.com/apache/arrow-datafusion/pull/1622)[#1709](https://github.com/apache/arrow-datafusion/pull/1709)
+ - Support for registering `DataFrame` as table
[#1699](https://github.com/apache/arrow-datafusion/pull/1699)
+ - Suppot `substring` function
[#1621](https://github.com/apache/arrow-datafusion/pull/1621)
+ - Support `array_agg(distinct ...)`
[#1579](https://github.com/apache/arrow-datafusion/pull/1579)
+ - Support `sort` on unprojected columns
[#1415](https://github.com/apache/arrow-datafusion/pull/1415)
+- Additional Integration Points
+ - A new public Expression simplification API
[#1717](https://github.com/apache/arrow-datafusion/pull/1717)
+- [DataFusion-Contrib](https://github.com/datafusion-contrib)
+ - A new GitHub organization created as a home for both `DataFusion`
extensions and as a testing ground for new features.
+ - Extensions
+ -
[DataFusion-Python](https://github.com/datafusion-contrib/datafusion-python)
+ -
[DataFusion-Java](https://github.com/datafusion-contrib/datafusion-java)
+ -
[DataFusion-hdsfs-native](https://github.com/datafusion-contrib/datafusion-hdfs-native)
+ -
[DataFusion-ObjectStore-s3](https://github.com/datafusion-contrib/datafusion-objectstore-s3)
+ - New Features
+ -
[DataFusion-Streams](https://github.com/datafusion-contrib/datafusion-streams)
+- [Arrow2](https://github.com/jorgecarleitao/arrow2)
+ - An [Arrow2 Branch](https://github.com/apache/arrow-datafusion/tree/arrow2)
has been created. There are ongoing discussions in
[DataFusion](https://github.com/apache/arrow-datafusion/issues/1532) and
[arrow-rs](https://github.com/apache/arrow-rs/issues/1176) about migrating
`DataFusion` to `Arrow2`
+
+For the full list of new features with their relevant PRs, see the
+[enhancements
section](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md)
+in the changelog.
+
+# Documentation and Roadmap
+
+The project's documentation is being consolidated into the official site. You
can find more details there on topics such as the SQL status (TO DO LINK) and a
user guide.
+
+To provide transparency on DataFusion’s priorities to users and developers a
three month roadmap will be published at the beginning of each quarter. This
can be found here (TO DO LINK once site is updated).
Review comment:
```suggestion
To provide transparency on DataFusion’s priorities to users and developers a
three month roadmap will be published at the beginning of each quarter. This
can be found
[here[(https://arrow.apache.org/datafusion/specification/roadmap.html).
```
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
+
+- DataFusion Crate
+ - The DataFusion crate is in the process of being split into multiple crates
in order to decrease compilation times and improve the development experience.
To start, datafusion-common (the core DataFusion components) and
datafusion-expr (DataFusion expressions, functions, and operators) will be
split out. There will be additional splits after the 7.0 release.
+- Performance Improvements and Optimizations
+ - Arrow’s dyn scalar kernels are now used which enable more efficient
operations on DictionaryArrays
[#1685](https://github.com/apache/arrow-datafusion/pull/1685)
Review comment:
```suggestion
- Arrow’s dyn scalar kernels are now used to enable efficient operations
on `DictionaryArray`s
[#1685](https://github.com/apache/arrow-datafusion/pull/1685)
```
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
+
+- DataFusion Crate
+ - The DataFusion crate is in the process of being split into multiple crates
in order to decrease compilation times and improve the development experience.
To start, datafusion-common (the core DataFusion components) and
datafusion-expr (DataFusion expressions, functions, and operators) will be
split out. There will be additional splits after the 7.0 release.
+- Performance Improvements and Optimizations
+ - Arrow’s dyn scalar kernels are now used which enable more efficient
operations on DictionaryArrays
[#1685](https://github.com/apache/arrow-datafusion/pull/1685)
+ - Switch from std::sync::Mutex to parking_lot::Mutex
[#1720](https://github.com/apache/arrow-datafusion/pull/1720)
+- New Features
+ - Better support for limiting resource usage
+ - MemoryMananger and DiskManager
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - Out of core sort
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - New metrics
+ - `Gauge` and `CurrentMemoryUsage`
[#1682](https://github.com/apache/arrow-datafusion/pull/1682)
+ - `Spill_count` and `spilled_bytes`
[#1641](https://github.com/apache/arrow-datafusion/pull/1641)
+ - New math functions
+ - `Approx_quantile`
[#1529](https://github.com/apache/arrow-datafusion/pull/1539)
+ - `stddev` and `variance` (sample and population)
[#1525](https://github.com/apache/arrow-datafusion/pull/1525)
+ - `corr` [#1561](https://github.com/apache/arrow-datafusion/pull/1561)
+ - Support decimal type
[#1394](https://github.com/apache/arrow-datafusion/pull/1394)[#1407](https://github.com/apache/arrow-datafusion/pull/1407)[#1408](https://github.com/apache/arrow-datafusion/pull/1408)[#1431](https://github.com/apache/arrow-datafusion/pull/1431)[#1483](https://github.com/apache/arrow-datafusion/pull/1483)[#1554](https://github.com/apache/arrow-datafusion/pull/1554)[#1640](https://github.com/apache/arrow-datafusion/pull/1640)
+ - Support for evolved schemas
[#1622](https://github.com/apache/arrow-datafusion/pull/1622)[#1709](https://github.com/apache/arrow-datafusion/pull/1709)
+ - Support for registering `DataFrame` as table
[#1699](https://github.com/apache/arrow-datafusion/pull/1699)
+ - Suppot `substring` function
[#1621](https://github.com/apache/arrow-datafusion/pull/1621)
+ - Support `array_agg(distinct ...)`
[#1579](https://github.com/apache/arrow-datafusion/pull/1579)
+ - Support `sort` on unprojected columns
[#1415](https://github.com/apache/arrow-datafusion/pull/1415)
+- Additional Integration Points
+ - A new public Expression simplification API
[#1717](https://github.com/apache/arrow-datafusion/pull/1717)
+- [DataFusion-Contrib](https://github.com/datafusion-contrib)
+ - A new GitHub organization created as a home for both `DataFusion`
extensions and as a testing ground for new features.
+ - Extensions
+ -
[DataFusion-Python](https://github.com/datafusion-contrib/datafusion-python)
+ -
[DataFusion-Java](https://github.com/datafusion-contrib/datafusion-java)
+ -
[DataFusion-hdsfs-native](https://github.com/datafusion-contrib/datafusion-hdfs-native)
+ -
[DataFusion-ObjectStore-s3](https://github.com/datafusion-contrib/datafusion-objectstore-s3)
+ - New Features
+ -
[DataFusion-Streams](https://github.com/datafusion-contrib/datafusion-streams)
+- [Arrow2](https://github.com/jorgecarleitao/arrow2)
+ - An [Arrow2 Branch](https://github.com/apache/arrow-datafusion/tree/arrow2)
has been created. There are ongoing discussions in
[DataFusion](https://github.com/apache/arrow-datafusion/issues/1532) and
[arrow-rs](https://github.com/apache/arrow-rs/issues/1176) about migrating
`DataFusion` to `Arrow2`
+
+For the full list of new features with their relevant PRs, see the
+[enhancements
section](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md)
+in the changelog.
Review comment:
```suggestion
```
I think this is redundant with the lead above
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
+
+- DataFusion Crate
+ - The DataFusion crate is in the process of being split into multiple crates
in order to decrease compilation times and improve the development experience.
To start, datafusion-common (the core DataFusion components) and
datafusion-expr (DataFusion expressions, functions, and operators) will be
split out. There will be additional splits after the 7.0 release.
+- Performance Improvements and Optimizations
+ - Arrow’s dyn scalar kernels are now used which enable more efficient
operations on DictionaryArrays
[#1685](https://github.com/apache/arrow-datafusion/pull/1685)
+ - Switch from std::sync::Mutex to parking_lot::Mutex
[#1720](https://github.com/apache/arrow-datafusion/pull/1720)
+- New Features
+ - Better support for limiting resource usage
+ - MemoryMananger and DiskManager
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - Out of core sort
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - New metrics
+ - `Gauge` and `CurrentMemoryUsage`
[#1682](https://github.com/apache/arrow-datafusion/pull/1682)
+ - `Spill_count` and `spilled_bytes`
[#1641](https://github.com/apache/arrow-datafusion/pull/1641)
+ - New math functions
+ - `Approx_quantile`
[#1529](https://github.com/apache/arrow-datafusion/pull/1539)
+ - `stddev` and `variance` (sample and population)
[#1525](https://github.com/apache/arrow-datafusion/pull/1525)
+ - `corr` [#1561](https://github.com/apache/arrow-datafusion/pull/1561)
+ - Support decimal type
[#1394](https://github.com/apache/arrow-datafusion/pull/1394)[#1407](https://github.com/apache/arrow-datafusion/pull/1407)[#1408](https://github.com/apache/arrow-datafusion/pull/1408)[#1431](https://github.com/apache/arrow-datafusion/pull/1431)[#1483](https://github.com/apache/arrow-datafusion/pull/1483)[#1554](https://github.com/apache/arrow-datafusion/pull/1554)[#1640](https://github.com/apache/arrow-datafusion/pull/1640)
+ - Support for evolved schemas
[#1622](https://github.com/apache/arrow-datafusion/pull/1622)[#1709](https://github.com/apache/arrow-datafusion/pull/1709)
+ - Support for registering `DataFrame` as table
[#1699](https://github.com/apache/arrow-datafusion/pull/1699)
+ - Suppot `substring` function
[#1621](https://github.com/apache/arrow-datafusion/pull/1621)
Review comment:
```suggestion
- Support for the `substring` function
[#1621](https://github.com/apache/arrow-datafusion/pull/1621)
```
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
+
+- DataFusion Crate
+ - The DataFusion crate is in the process of being split into multiple crates
in order to decrease compilation times and improve the development experience.
To start, datafusion-common (the core DataFusion components) and
datafusion-expr (DataFusion expressions, functions, and operators) will be
split out. There will be additional splits after the 7.0 release.
+- Performance Improvements and Optimizations
+ - Arrow’s dyn scalar kernels are now used which enable more efficient
operations on DictionaryArrays
[#1685](https://github.com/apache/arrow-datafusion/pull/1685)
+ - Switch from std::sync::Mutex to parking_lot::Mutex
[#1720](https://github.com/apache/arrow-datafusion/pull/1720)
Review comment:
```suggestion
- Switch from `std::sync::Mutex` to `parking_lot::Mutex`
[#1720](https://github.com/apache/arrow-datafusion/pull/1720)
```
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
+
+- DataFusion Crate
+ - The DataFusion crate is in the process of being split into multiple crates
in order to decrease compilation times and improve the development experience.
To start, datafusion-common (the core DataFusion components) and
datafusion-expr (DataFusion expressions, functions, and operators) will be
split out. There will be additional splits after the 7.0 release.
+- Performance Improvements and Optimizations
+ - Arrow’s dyn scalar kernels are now used which enable more efficient
operations on DictionaryArrays
[#1685](https://github.com/apache/arrow-datafusion/pull/1685)
+ - Switch from std::sync::Mutex to parking_lot::Mutex
[#1720](https://github.com/apache/arrow-datafusion/pull/1720)
+- New Features
+ - Better support for limiting resource usage
+ - MemoryMananger and DiskManager
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - Out of core sort
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - New metrics
+ - `Gauge` and `CurrentMemoryUsage`
[#1682](https://github.com/apache/arrow-datafusion/pull/1682)
+ - `Spill_count` and `spilled_bytes`
[#1641](https://github.com/apache/arrow-datafusion/pull/1641)
+ - New math functions
+ - `Approx_quantile`
[#1529](https://github.com/apache/arrow-datafusion/pull/1539)
+ - `stddev` and `variance` (sample and population)
[#1525](https://github.com/apache/arrow-datafusion/pull/1525)
+ - `corr` [#1561](https://github.com/apache/arrow-datafusion/pull/1561)
+ - Support decimal type
[#1394](https://github.com/apache/arrow-datafusion/pull/1394)[#1407](https://github.com/apache/arrow-datafusion/pull/1407)[#1408](https://github.com/apache/arrow-datafusion/pull/1408)[#1431](https://github.com/apache/arrow-datafusion/pull/1431)[#1483](https://github.com/apache/arrow-datafusion/pull/1483)[#1554](https://github.com/apache/arrow-datafusion/pull/1554)[#1640](https://github.com/apache/arrow-datafusion/pull/1640)
+ - Support for evolved schemas
[#1622](https://github.com/apache/arrow-datafusion/pull/1622)[#1709](https://github.com/apache/arrow-datafusion/pull/1709)
+ - Support for registering `DataFrame` as table
[#1699](https://github.com/apache/arrow-datafusion/pull/1699)
+ - Suppot `substring` function
[#1621](https://github.com/apache/arrow-datafusion/pull/1621)
+ - Support `array_agg(distinct ...)`
[#1579](https://github.com/apache/arrow-datafusion/pull/1579)
+ - Support `sort` on unprojected columns
[#1415](https://github.com/apache/arrow-datafusion/pull/1415)
+- Additional Integration Points
+ - A new public Expression simplification API
[#1717](https://github.com/apache/arrow-datafusion/pull/1717)
+- [DataFusion-Contrib](https://github.com/datafusion-contrib)
+ - A new GitHub organization created as a home for both `DataFusion`
extensions and as a testing ground for new features.
+ - Extensions
+ -
[DataFusion-Python](https://github.com/datafusion-contrib/datafusion-python)
+ -
[DataFusion-Java](https://github.com/datafusion-contrib/datafusion-java)
+ -
[DataFusion-hdsfs-native](https://github.com/datafusion-contrib/datafusion-hdfs-native)
+ -
[DataFusion-ObjectStore-s3](https://github.com/datafusion-contrib/datafusion-objectstore-s3)
+ - New Features
+ -
[DataFusion-Streams](https://github.com/datafusion-contrib/datafusion-streams)
+- [Arrow2](https://github.com/jorgecarleitao/arrow2)
+ - An [Arrow2 Branch](https://github.com/apache/arrow-datafusion/tree/arrow2)
has been created. There are ongoing discussions in
[DataFusion](https://github.com/apache/arrow-datafusion/issues/1532) and
[arrow-rs](https://github.com/apache/arrow-rs/issues/1176) about migrating
`DataFusion` to `Arrow2`
+
+For the full list of new features with their relevant PRs, see the
+[enhancements
section](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md)
+in the changelog.
+
+# Documentation and Roadmap
+
+The project's documentation is being consolidated into the official site. You
can find more details there on topics such as the SQL status (TO DO LINK) and a
user guide.
+
+To provide transparency on DataFusion’s priorities to users and developers a
three month roadmap will be published at the beginning of each quarter. This
can be found here (TO DO LINK once site is updated).
+
+See full details on DataFusion’s ambitions (TO DO LINK).
+
Review comment:
```suggestion
```
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
+
+- DataFusion Crate
+ - The DataFusion crate is in the process of being split into multiple crates
in order to decrease compilation times and improve the development experience.
To start, datafusion-common (the core DataFusion components) and
datafusion-expr (DataFusion expressions, functions, and operators) will be
split out. There will be additional splits after the 7.0 release.
+- Performance Improvements and Optimizations
+ - Arrow’s dyn scalar kernels are now used which enable more efficient
operations on DictionaryArrays
[#1685](https://github.com/apache/arrow-datafusion/pull/1685)
+ - Switch from std::sync::Mutex to parking_lot::Mutex
[#1720](https://github.com/apache/arrow-datafusion/pull/1720)
+- New Features
+ - Better support for limiting resource usage
+ - MemoryMananger and DiskManager
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - Out of core sort
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - New metrics
+ - `Gauge` and `CurrentMemoryUsage`
[#1682](https://github.com/apache/arrow-datafusion/pull/1682)
+ - `Spill_count` and `spilled_bytes`
[#1641](https://github.com/apache/arrow-datafusion/pull/1641)
+ - New math functions
+ - `Approx_quantile`
[#1529](https://github.com/apache/arrow-datafusion/pull/1539)
+ - `stddev` and `variance` (sample and population)
[#1525](https://github.com/apache/arrow-datafusion/pull/1525)
+ - `corr` [#1561](https://github.com/apache/arrow-datafusion/pull/1561)
+ - Support decimal type
[#1394](https://github.com/apache/arrow-datafusion/pull/1394)[#1407](https://github.com/apache/arrow-datafusion/pull/1407)[#1408](https://github.com/apache/arrow-datafusion/pull/1408)[#1431](https://github.com/apache/arrow-datafusion/pull/1431)[#1483](https://github.com/apache/arrow-datafusion/pull/1483)[#1554](https://github.com/apache/arrow-datafusion/pull/1554)[#1640](https://github.com/apache/arrow-datafusion/pull/1640)
+ - Support for evolved schemas
[#1622](https://github.com/apache/arrow-datafusion/pull/1622)[#1709](https://github.com/apache/arrow-datafusion/pull/1709)
+ - Support for registering `DataFrame` as table
[#1699](https://github.com/apache/arrow-datafusion/pull/1699)
+ - Suppot `substring` function
[#1621](https://github.com/apache/arrow-datafusion/pull/1621)
+ - Support `array_agg(distinct ...)`
[#1579](https://github.com/apache/arrow-datafusion/pull/1579)
+ - Support `sort` on unprojected columns
[#1415](https://github.com/apache/arrow-datafusion/pull/1415)
+- Additional Integration Points
+ - A new public Expression simplification API
[#1717](https://github.com/apache/arrow-datafusion/pull/1717)
+- [DataFusion-Contrib](https://github.com/datafusion-contrib)
+ - A new GitHub organization created as a home for both `DataFusion`
extensions and as a testing ground for new features.
+ - Extensions
+ -
[DataFusion-Python](https://github.com/datafusion-contrib/datafusion-python)
+ -
[DataFusion-Java](https://github.com/datafusion-contrib/datafusion-java)
+ -
[DataFusion-hdsfs-native](https://github.com/datafusion-contrib/datafusion-hdfs-native)
+ -
[DataFusion-ObjectStore-s3](https://github.com/datafusion-contrib/datafusion-objectstore-s3)
+ - New Features
+ -
[DataFusion-Streams](https://github.com/datafusion-contrib/datafusion-streams)
+- [Arrow2](https://github.com/jorgecarleitao/arrow2)
+ - An [Arrow2 Branch](https://github.com/apache/arrow-datafusion/tree/arrow2)
has been created. There are ongoing discussions in
[DataFusion](https://github.com/apache/arrow-datafusion/issues/1532) and
[arrow-rs](https://github.com/apache/arrow-rs/issues/1176) about migrating
`DataFusion` to `Arrow2`
+
+For the full list of new features with their relevant PRs, see the
+[enhancements
section](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md)
+in the changelog.
+
+# Documentation and Roadmap
+
+The project's documentation is being consolidated into the official site. You
can find more details there on topics such as the SQL status (TO DO LINK) and a
user guide.
Review comment:
```suggestion
We are working to consolidate the documentation into the [official
site](https://arrow.apache.org/datafusion). You can find more details there on
topics such as the [SQL
status](https://arrow.apache.org/datafusion/user-guide/sql/index.html) and a
[user
guide](https://arrow.apache.org/datafusion/user-guide/introduction.html#introduction).
This is also an area we would love to get help from the broader community.
```
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
+
+- DataFusion Crate
+ - The DataFusion crate is in the process of being split into multiple crates
in order to decrease compilation times and improve the development experience.
To start, datafusion-common (the core DataFusion components) and
datafusion-expr (DataFusion expressions, functions, and operators) will be
split out. There will be additional splits after the 7.0 release.
+- Performance Improvements and Optimizations
+ - Arrow’s dyn scalar kernels are now used which enable more efficient
operations on DictionaryArrays
[#1685](https://github.com/apache/arrow-datafusion/pull/1685)
+ - Switch from std::sync::Mutex to parking_lot::Mutex
[#1720](https://github.com/apache/arrow-datafusion/pull/1720)
+- New Features
+ - Better support for limiting resource usage
+ - MemoryMananger and DiskManager
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - Out of core sort
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - New metrics
+ - `Gauge` and `CurrentMemoryUsage`
[#1682](https://github.com/apache/arrow-datafusion/pull/1682)
+ - `Spill_count` and `spilled_bytes`
[#1641](https://github.com/apache/arrow-datafusion/pull/1641)
+ - New math functions
+ - `Approx_quantile`
[#1529](https://github.com/apache/arrow-datafusion/pull/1539)
+ - `stddev` and `variance` (sample and population)
[#1525](https://github.com/apache/arrow-datafusion/pull/1525)
+ - `corr` [#1561](https://github.com/apache/arrow-datafusion/pull/1561)
+ - Support decimal type
[#1394](https://github.com/apache/arrow-datafusion/pull/1394)[#1407](https://github.com/apache/arrow-datafusion/pull/1407)[#1408](https://github.com/apache/arrow-datafusion/pull/1408)[#1431](https://github.com/apache/arrow-datafusion/pull/1431)[#1483](https://github.com/apache/arrow-datafusion/pull/1483)[#1554](https://github.com/apache/arrow-datafusion/pull/1554)[#1640](https://github.com/apache/arrow-datafusion/pull/1640)
+ - Support for evolved schemas
[#1622](https://github.com/apache/arrow-datafusion/pull/1622)[#1709](https://github.com/apache/arrow-datafusion/pull/1709)
+ - Support for registering `DataFrame` as table
[#1699](https://github.com/apache/arrow-datafusion/pull/1699)
+ - Suppot `substring` function
[#1621](https://github.com/apache/arrow-datafusion/pull/1621)
+ - Support `array_agg(distinct ...)`
[#1579](https://github.com/apache/arrow-datafusion/pull/1579)
+ - Support `sort` on unprojected columns
[#1415](https://github.com/apache/arrow-datafusion/pull/1415)
+- Additional Integration Points
+ - A new public Expression simplification API
[#1717](https://github.com/apache/arrow-datafusion/pull/1717)
+- [DataFusion-Contrib](https://github.com/datafusion-contrib)
+ - A new GitHub organization created as a home for both `DataFusion`
extensions and as a testing ground for new features.
Review comment:
❤️
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
Review comment:
🤔 I was trying to accentuate the positives here / keep readers
interested. Maybe I have gone too far 🤔
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,154 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion's SQL, `DataFrame`, and manual `PlanBuilder` API let users access
a sophisticated query optimizer and execution engine capable of fast, resource
efficient, and parallel execution that takes optimal advantage of todays
multicore hardware. Being written in Rust means DataFusion can offer *both* the
safety of dynamic languages as well as the resource efficiency of a compiled
language.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The following section highlights some of the improvements in this release. Of
course, many other bug fixes and improvements have also been made and we refer
you to the complete
[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md)
for the full detail.
Review comment:
That is correct -- I will create a `7.0.0` tag in the datafusion-repo
once it has been published
Right now, you can preview the log here:
https://github.com/apache/arrow-datafusion/blob/7.0.0-rc2/datafusion/CHANGELOG.md
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,154 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
Review comment:
that is a *great* catch. 🦅 👁️ 👍
##########
File path: _posts/2022-02-14-datafusion-7.0.0.md
##########
@@ -0,0 +1,166 @@
+---
+layout: post
+title: Apache Arrow DataFusion 6.0.0 Release
+date: "2022-02-14 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Introduction
+
+[DataFusion](https://arrow.apache.org/datafusion/) is an extensible query
execution framework, written in Rust, that uses Apache Arrow as its in-memory
format.
+
+When you want to extend your Rust project with [SQL
support](https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html), a
DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV
data, DataFusion is definitely worth checking out.
+
+DataFusion supports both a SQL and DataFrame API for building logical query
plans as well as a sophisticated query optimizer and execution engine capable
of parallel execution against memory, CSV, Parquet, Avro and JSON.
+
+The Apache Arrow team is pleased to announce the DataFusion 7.0.0 release.
This covers 4 months of development work
+and includes 195 commits from the following 37 distinct contributors.
+
+<!--
+git log --pretty=oneline 5.0.0..6.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
+ 134
+
+git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli datafusion-examples |
wc -l
+ 29
+
+ Carlos and xudong963 are same individual
+-->
+
+```
+ 44 Andrew Lamb
+ 24 Kun Liu
+ 23 Jiayu Liu
+ 12 xudong.w
+ 11 Yijie Shen
+ 9 Matthew Turner
+ 7 Liang-Chi Hsieh
+ 5 Lin Ma
+ 5 Carlos
+ 4 Stephen Carman
+ 4 James Katz
+ 4 Dmitry Patsura
+ 4 QP Hou
+ 3 dependabot[bot]
+ 3 Remzi Yang
+ 3 Yang
+ 3 ic4y
+ 3 Daniël Heres
+ 2 Andy Grove
+ 2 Raphael Taylor-Davies
+ 2 Jason Tianyi Wang
+ 2 Dan Harris
+ 2 Sergey Melnychuk
+ 1 Nitish Tiwari
+ 1 Dom
+ 1 Eduard Karacharov
+ 1 Javier Goday
+ 1 Boaz
+ 1 Marko Mikulicic
+ 1 Max Burke
+ 1 Carol (Nichols || Goulding)
+ 1 Phillip Cloud
+ 1 Rich
+ 1 Toby Hede
+ 1 Will Jones
+ 1 r.4ntix
+ 1 rdettai
+```
+
+The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bug fixes
+and improvements have been made: we refer you to the complete
+[changelog](https://github.com/apache/arrow-datafusion/blob/7.0.0/datafusion/CHANGELOG.md).
+
+# Summary
+
+There have been significant improvements across the board since the 6.0
release which are summarized below.
+
+- DataFusion Crate
+ - The DataFusion crate is in the process of being split into multiple crates
in order to decrease compilation times and improve the development experience.
To start, datafusion-common (the core DataFusion components) and
datafusion-expr (DataFusion expressions, functions, and operators) will be
split out. There will be additional splits after the 7.0 release.
+- Performance Improvements and Optimizations
+ - Arrow’s dyn scalar kernels are now used which enable more efficient
operations on DictionaryArrays
[#1685](https://github.com/apache/arrow-datafusion/pull/1685)
+ - Switch from std::sync::Mutex to parking_lot::Mutex
[#1720](https://github.com/apache/arrow-datafusion/pull/1720)
+- New Features
+ - Better support for limiting resource usage
+ - MemoryMananger and DiskManager
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - Out of core sort
[#1526](https://github.com/apache/arrow-datafusion/pull/1526)
+ - New metrics
+ - `Gauge` and `CurrentMemoryUsage`
[#1682](https://github.com/apache/arrow-datafusion/pull/1682)
+ - `Spill_count` and `spilled_bytes`
[#1641](https://github.com/apache/arrow-datafusion/pull/1641)
+ - New math functions
+ - `Approx_quantile`
[#1529](https://github.com/apache/arrow-datafusion/pull/1539)
+ - `stddev` and `variance` (sample and population)
[#1525](https://github.com/apache/arrow-datafusion/pull/1525)
+ - `corr` [#1561](https://github.com/apache/arrow-datafusion/pull/1561)
+ - Support decimal type
[#1394](https://github.com/apache/arrow-datafusion/pull/1394)[#1407](https://github.com/apache/arrow-datafusion/pull/1407)[#1408](https://github.com/apache/arrow-datafusion/pull/1408)[#1431](https://github.com/apache/arrow-datafusion/pull/1431)[#1483](https://github.com/apache/arrow-datafusion/pull/1483)[#1554](https://github.com/apache/arrow-datafusion/pull/1554)[#1640](https://github.com/apache/arrow-datafusion/pull/1640)
+ - Support for evolved schemas
[#1622](https://github.com/apache/arrow-datafusion/pull/1622)[#1709](https://github.com/apache/arrow-datafusion/pull/1709)
Review comment:
i think @thinkharderdev concluded the schema merging already worked for
CSV and Json files, but I may also misunderstand
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]