[GitHub] [arrow-site] vertexclique commented on a change in pull request #81: [Website] Rust release notes / blog post

GitBox Tue, 20 Oct 2020 16:15:30 -0700


vertexclique commented on a change in pull request #81:
URL: https://github.com/apache/arrow-site/pull/81#discussion_r508887309




##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 

Review comment:
       ```suggestion
   particularly with almost 200 issues resolved by 15 contributors. In this 
blog post, we will go through the main changes 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 

Review comment:
       ```suggestion
   While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature-rich, with the 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 

Review comment:
       ```suggestion
   - Primitive arrays (e.g., array of integers) can now be converted to and 
initialized from an iterator. This exposes a 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 
+very popular API - iterators - to arrow arrays. Work for other types will 
continue throughout 3.0.0.
+
+## Improved Variable-sized Arrays
+
+- Variable sized arrays (e.g. array of strings) have been internally 
refactored to more easily support their larger (64 

Review comment:
       ```suggestion
   - Variable sized arrays (e.g., array of strings) have been internally 
refactored to more easily aupport their larger (64-bit 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 
+very popular API - iterators - to arrow arrays. Work for other types will 
continue throughout 3.0.0.
+
+## Improved Variable-sized Arrays
+
+- Variable sized arrays (e.g. array of strings) have been internally 
refactored to more easily support their larger (64 
+bit size offset) version. This allowed us to generalize some of the kernels to 
both (32 and 64) versions, and also 
+perform type checks when building them.
+
+## Kernels
+
+There have been numerous improvements in the Arrow compute kernels, including:
+
+- New kernels have been added for string operations, including substring, min, 
max, concat, and length.
+- Aggregate sum is now implemented for SIMD with a 5x improvement over the 
non-SIMD operation
+- Many kernels have been improved to support dictionary-encoded arrays
+- Some kernels were optimized for arrays without nulls, making them 
significantly faster in that case.
+- Many kernels were optimized in the number of memory copies that are needed 
to apply them, and also on their 
+implementation.
+
+## Other Improvements
+
+The Array trait now has `get_buffer_memory_size` and `get_array_memory_size` 
methods for determining the amount of 
+memory allocated for the array.
+
+# Parquet
+
+Significant effort is underway to create a Parquet writer for Arrow data. This 
work has not been released as part of 
+2.0.0, and is planned for the 3.0.0 release. The development of this writer is 
being carried out on the 
+[rust-parquet-arrow-writer][2] branch, and the branch is regularly 
synchronized with the main branch.
+As part of the writer, the necessary improvements and features are being added 
to the reader.
+
+The main focus areas are:
+- Supporting nested Arrow types, such as `List<Struct<[Dictionary, String]>>`
+- Ensuring correct round-trip between the reader and writer by encoding Arrow 
schemas in the Parquet metadata
+- Improve null value writing for Parquet
+
+A new `parquet_derive` crate has been created, which allows users to derive 
Parquet records for simple structs. Refer to 
+the [parquet_derive crate][3] for usage examples.
+
+# DataFusion
+
+DataFusion is an in-memory query engine with DataFrame and SQL APIs, built on 
top of base Arrow support.
+
+## DataFrame API
+
+DataFusion now has a richer [DataFrame API][4] with improved documentation 
showing example usage, 
+supporting the following operations:
+
+- select_columns
+- select
+- filter
+- aggregate
+- limit
+- sort
+- collect
+- explain
+
+## Performance & Scalability
+
+DataFusion query execution now uses `async`/`await` with the tokio threaded 
runtime rather than launching dedicated 
+threads, making queries scale much better across available cores.
+
+The hash aggregate physical operator has been largely re-written, resulting in 
significant performance improvements.

Review comment:
       ```suggestion
   The hash aggregate physical operator has been primarily re-written, 
resulting in significant performance improvements.
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 

Review comment:
       ```suggestion
   affecting core Arrow, Parquet support, and DataFusion query engine. The full 
list of resolved issues can be found 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 

Review comment:
       ```suggestion
   Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general and the Rust subproject,
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 
+very popular API - iterators - to arrow arrays. Work for other types will 
continue throughout 3.0.0.
+
+## Improved Variable-sized Arrays
+
+- Variable sized arrays (e.g. array of strings) have been internally 
refactored to more easily support their larger (64 
+bit size offset) version. This allowed us to generalize some of the kernels to 
both (32 and 64) versions, and also 
+perform type checks when building them.
+
+## Kernels
+
+There have been numerous improvements in the Arrow compute kernels, including:
+
+- New kernels have been added for string operations, including substring, min, 
max, concat, and length.
+- Aggregate sum is now implemented for SIMD with a 5x improvement over the 
non-SIMD operation
+- Many kernels have been improved to support dictionary-encoded arrays
+- Some kernels were optimized for arrays without nulls, making them 
significantly faster in that case.
+- Many kernels were optimized in the number of memory copies that are needed 
to apply them, and also on their 
+implementation.
+
+## Other Improvements
+
+The Array trait now has `get_buffer_memory_size` and `get_array_memory_size` 
methods for determining the amount of 
+memory allocated for the array.
+
+# Parquet
+
+Significant effort is underway to create a Parquet writer for Arrow data. This 
work has not been released as part of 
+2.0.0, and is planned for the 3.0.0 release. The development of this writer is 
being carried out on the 
+[rust-parquet-arrow-writer][2] branch, and the branch is regularly 
synchronized with the main branch.
+As part of the writer, the necessary improvements and features are being added 
to the reader.
+
+The main focus areas are:
+- Supporting nested Arrow types, such as `List<Struct<[Dictionary, String]>>`
+- Ensuring correct round-trip between the reader and writer by encoding Arrow 
schemas in the Parquet metadata
+- Improve null value writing for Parquet
+
+A new `parquet_derive` crate has been created, which allows users to derive 
Parquet records for simple structs. Refer to 
+the [parquet_derive crate][3] for usage examples.
+
+# DataFusion
+
+DataFusion is an in-memory query engine with DataFrame and SQL APIs, built on 
top of base Arrow support.
+
+## DataFrame API
+
+DataFusion now has a richer [DataFrame API][4] with improved documentation 
showing example usage, 
+supporting the following operations:
+
+- select_columns
+- select
+- filter
+- aggregate
+- limit
+- sort
+- collect
+- explain
+
+## Performance & Scalability
+
+DataFusion query execution now uses `async`/`await` with the tokio threaded 
runtime rather than launching dedicated 
+threads, making queries scale much better across available cores.
+
+The hash aggregate physical operator has been largely re-written, resulting in 
significant performance improvements.
+
+## Expressions and Compute
+
+### Improved Scalar Functions
+
+DataFusion has many new functions, both in the SQL and the DataFrame API:
+- Length of an string
+- COUNT(DISTINCT column)
+- to_timestamp
+- IsNull and IsNotNull
+- Min/Max for strings (lexicographic order)
+- Array of columns
+- Concatenation of strings
+- Aliases of aggregate expressions
+
+Many existing expressions were also significantly optimized (2-3x speedups) by 
avoiding memory copies and leveraging 
+Arrow format’s invariants.
+
+Unary mathematical functions (such as sqrt) now support both 32 and 64 bit 
floats, and return the corresponding type, 
+thereby allowing faster operations when higher precision is not needed.
+
+### Improved User-defined Functions (UDFs)
+The API to use and register UDFs has been significantly improved, allowing 
users to register UDFs and call them both 
+via SQL and the DataFrame API. UDFs now also have the same generality as 
DataFusion’s own functions, including variadic 
+and dynamically typed arguments.
+
+### User-defined Aggregate Functions (UDAFs)
+DataFusion now supports user-defined aggregate functions that can be used to 
perform operations than span multiple 
+rows, batches and partitions. UDAFs have the same generality of DataFusion’s 
own functions and support both row updates 
+and batch updates. You can check out [this example][5] to learn how to declare 
and use a UDAF.
+
+### User-defined Constants
+DataFusion now supports registering constants (e.g. “@version”), that live for 
the duration of the execution context 
+and can be accessed from SQL.
+
+## Query Planning & Optimization
+
+### User-defined logical plans
+
+The Logical Plan enum is now extensible through an Extension variant which 
accepts a UserDefinedLogicalPlan trait using 
+dynamic dispatch. Consequently, DataFusion now supports user-defined logical 
nodes, thereby allowing complex nodes to 
+be planned and executed. You can check this example to learn how to declare a 
new node.
+
+### Predicate push-down
+
+DataFusion now has a Predicate push-down optimizer rule that pushes filter 
operations as close as possible to scans, 
+thereby speeding up the physical execution of suboptimal queries created via 
the DataFrame API.
+SQL
+
+DataFusion now uses a more recent release of the sqlparser crate, which has 
much more comprehensive support for SQL 
+syntax and also supports multiple dialects (Postgres, MS SQL, and MySQL).
+
+It is now possible to see the query plan for a SQL statement using EXPLAIN 
syntax.
+
+# Benchmarks
+    
+The benchmark crate now contains a new benchmark based on TPC-H that can 
execute TPC-H query 1 against CSV, Parquet, 
+and memory data sources. This is useful for running benchmarks against larger 
data sets.
+
+# Integration Testing / IPC
+
+Arrow IPC is the format for serialization and interprocess communication. It 
is described in arrow.apache.org, and is 
+the format used for file and stream I/O between applications wishing to 
interchange Arrow data.
+
+The Arrow project released IPC version 5 of the Arrow IPC format in version 
1.0.0. Before that, a message padding 
+change was made in version 0.15.0 to change the default padding to 8 bytes, 
while remaining in IPC version 4. Arrow 
+release 0.14.1 and earlier were the last releases to use the legacy 4 byte 
alignment.
+As part of 2.0.0, the Rust implementation was updated to comply with the 
changes up to release 0.15.0 of Arrow. 
+Work on supporting IPC version 5 is underway, and is expected to be completed 
in time for 3.0.0.
+
+As part of the conformance work, Rust is being added to the Arrow integration 
suite, which tests that supported 
+language implementations (ARROW-3690):
+
+- Comply with the Arrow IPC format
+- Can read and write each other’s generated data
+- IPC version 4 is being verified through the above work.
+
+# Roadmap for 3.0.0 and Beyond
+
+Here are some of the initiatives that contributors are currently working on 
for future releases:
+
+- Support stable Rust
+- Improved DictionaryArray support and performance
+- Implement inner equijoins
+- Support for various platforms like ARMv8
+- Supporting the C Data Interface from Rust, to better support 
interoperability with other Arrow implementations

Review comment:
       ```suggestion
   - Supporting the C Data Interface from Rust to better support 
interoperability with other Arrow implementations
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 
+very popular API - iterators - to arrow arrays. Work for other types will 
continue throughout 3.0.0.
+
+## Improved Variable-sized Arrays
+
+- Variable sized arrays (e.g. array of strings) have been internally 
refactored to more easily support their larger (64 
+bit size offset) version. This allowed us to generalize some of the kernels to 
both (32 and 64) versions, and also 
+perform type checks when building them.
+
+## Kernels
+
+There have been numerous improvements in the Arrow compute kernels, including:
+
+- New kernels have been added for string operations, including substring, min, 
max, concat, and length.
+- Aggregate sum is now implemented for SIMD with a 5x improvement over the 
non-SIMD operation
+- Many kernels have been improved to support dictionary-encoded arrays
+- Some kernels were optimized for arrays without nulls, making them 
significantly faster in that case.
+- Many kernels were optimized in the number of memory copies that are needed 
to apply them, and also on their 
+implementation.
+
+## Other Improvements
+
+The Array trait now has `get_buffer_memory_size` and `get_array_memory_size` 
methods for determining the amount of 
+memory allocated for the array.
+
+# Parquet
+
+Significant effort is underway to create a Parquet writer for Arrow data. This 
work has not been released as part of 
+2.0.0, and is planned for the 3.0.0 release. The development of this writer is 
being carried out on the 
+[rust-parquet-arrow-writer][2] branch, and the branch is regularly 
synchronized with the main branch.
+As part of the writer, the necessary improvements and features are being added 
to the reader.
+
+The main focus areas are:
+- Supporting nested Arrow types, such as `List<Struct<[Dictionary, String]>>`
+- Ensuring correct round-trip between the reader and writer by encoding Arrow 
schemas in the Parquet metadata
+- Improve null value writing for Parquet
+
+A new `parquet_derive` crate has been created, which allows users to derive 
Parquet records for simple structs. Refer to 
+the [parquet_derive crate][3] for usage examples.
+
+# DataFusion
+
+DataFusion is an in-memory query engine with DataFrame and SQL APIs, built on 
top of base Arrow support.
+
+## DataFrame API
+
+DataFusion now has a richer [DataFrame API][4] with improved documentation 
showing example usage, 
+supporting the following operations:
+
+- select_columns
+- select
+- filter
+- aggregate
+- limit
+- sort
+- collect
+- explain
+
+## Performance & Scalability
+
+DataFusion query execution now uses `async`/`await` with the tokio threaded 
runtime rather than launching dedicated 
+threads, making queries scale much better across available cores.
+
+The hash aggregate physical operator has been largely re-written, resulting in 
significant performance improvements.
+
+## Expressions and Compute
+
+### Improved Scalar Functions
+
+DataFusion has many new functions, both in the SQL and the DataFrame API:
+- Length of an string
+- COUNT(DISTINCT column)
+- to_timestamp
+- IsNull and IsNotNull
+- Min/Max for strings (lexicographic order)
+- Array of columns
+- Concatenation of strings
+- Aliases of aggregate expressions
+
+Many existing expressions were also significantly optimized (2-3x speedups) by 
avoiding memory copies and leveraging 
+Arrow format’s invariants.
+
+Unary mathematical functions (such as sqrt) now support both 32 and 64 bit 
floats, and return the corresponding type, 
+thereby allowing faster operations when higher precision is not needed.
+
+### Improved User-defined Functions (UDFs)
+The API to use and register UDFs has been significantly improved, allowing 
users to register UDFs and call them both 
+via SQL and the DataFrame API. UDFs now also have the same generality as 
DataFusion’s own functions, including variadic 
+and dynamically typed arguments.
+
+### User-defined Aggregate Functions (UDAFs)
+DataFusion now supports user-defined aggregate functions that can be used to 
perform operations than span multiple 
+rows, batches and partitions. UDAFs have the same generality of DataFusion’s 
own functions and support both row updates 

Review comment:
       ```suggestion
   rows, batches, and partitions. UDAFs have the same generality as 
DataFusion’s functions and support both row updates 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 
+very popular API - iterators - to arrow arrays. Work for other types will 
continue throughout 3.0.0.
+
+## Improved Variable-sized Arrays
+
+- Variable sized arrays (e.g. array of strings) have been internally 
refactored to more easily support their larger (64 
+bit size offset) version. This allowed us to generalize some of the kernels to 
both (32 and 64) versions, and also 
+perform type checks when building them.
+
+## Kernels
+
+There have been numerous improvements in the Arrow compute kernels, including:
+
+- New kernels have been added for string operations, including substring, min, 
max, concat, and length.
+- Aggregate sum is now implemented for SIMD with a 5x improvement over the 
non-SIMD operation
+- Many kernels have been improved to support dictionary-encoded arrays
+- Some kernels were optimized for arrays without nulls, making them 
significantly faster in that case.
+- Many kernels were optimized in the number of memory copies that are needed 
to apply them, and also on their 
+implementation.
+
+## Other Improvements
+
+The Array trait now has `get_buffer_memory_size` and `get_array_memory_size` 
methods for determining the amount of 
+memory allocated for the array.
+
+# Parquet
+
+Significant effort is underway to create a Parquet writer for Arrow data. This 
work has not been released as part of 
+2.0.0, and is planned for the 3.0.0 release. The development of this writer is 
being carried out on the 
+[rust-parquet-arrow-writer][2] branch, and the branch is regularly 
synchronized with the main branch.
+As part of the writer, the necessary improvements and features are being added 
to the reader.
+
+The main focus areas are:
+- Supporting nested Arrow types, such as `List<Struct<[Dictionary, String]>>`
+- Ensuring correct round-trip between the reader and writer by encoding Arrow 
schemas in the Parquet metadata
+- Improve null value writing for Parquet
+
+A new `parquet_derive` crate has been created, which allows users to derive 
Parquet records for simple structs. Refer to 
+the [parquet_derive crate][3] for usage examples.
+
+# DataFusion
+
+DataFusion is an in-memory query engine with DataFrame and SQL APIs, built on 
top of base Arrow support.
+
+## DataFrame API
+
+DataFusion now has a richer [DataFrame API][4] with improved documentation 
showing example usage, 
+supporting the following operations:
+
+- select_columns
+- select
+- filter
+- aggregate
+- limit
+- sort
+- collect
+- explain
+
+## Performance & Scalability
+
+DataFusion query execution now uses `async`/`await` with the tokio threaded 
runtime rather than launching dedicated 
+threads, making queries scale much better across available cores.
+
+The hash aggregate physical operator has been largely re-written, resulting in 
significant performance improvements.
+
+## Expressions and Compute
+
+### Improved Scalar Functions
+
+DataFusion has many new functions, both in the SQL and the DataFrame API:
+- Length of an string
+- COUNT(DISTINCT column)
+- to_timestamp
+- IsNull and IsNotNull
+- Min/Max for strings (lexicographic order)
+- Array of columns
+- Concatenation of strings
+- Aliases of aggregate expressions
+
+Many existing expressions were also significantly optimized (2-3x speedups) by 
avoiding memory copies and leveraging 
+Arrow format’s invariants.
+
+Unary mathematical functions (such as sqrt) now support both 32 and 64 bit 
floats, and return the corresponding type, 
+thereby allowing faster operations when higher precision is not needed.
+
+### Improved User-defined Functions (UDFs)
+The API to use and register UDFs has been significantly improved, allowing 
users to register UDFs and call them both 
+via SQL and the DataFrame API. UDFs now also have the same generality as 
DataFusion’s own functions, including variadic 
+and dynamically typed arguments.
+
+### User-defined Aggregate Functions (UDAFs)
+DataFusion now supports user-defined aggregate functions that can be used to 
perform operations than span multiple 
+rows, batches and partitions. UDAFs have the same generality of DataFusion’s 
own functions and support both row updates 
+and batch updates. You can check out [this example][5] to learn how to declare 
and use a UDAF.
+
+### User-defined Constants
+DataFusion now supports registering constants (e.g. “@version”), that live for 
the duration of the execution context 
+and can be accessed from SQL.
+
+## Query Planning & Optimization
+
+### User-defined logical plans
+
+The Logical Plan enum is now extensible through an Extension variant which 
accepts a UserDefinedLogicalPlan trait using 
+dynamic dispatch. Consequently, DataFusion now supports user-defined logical 
nodes, thereby allowing complex nodes to 
+be planned and executed. You can check this example to learn how to declare a 
new node.
+
+### Predicate push-down
+
+DataFusion now has a Predicate push-down optimizer rule that pushes filter 
operations as close as possible to scans, 
+thereby speeding up the physical execution of suboptimal queries created via 
the DataFrame API.
+SQL
+
+DataFusion now uses a more recent release of the sqlparser crate, which has 
much more comprehensive support for SQL 
+syntax and also supports multiple dialects (Postgres, MS SQL, and MySQL).
+
+It is now possible to see the query plan for a SQL statement using EXPLAIN 
syntax.
+
+# Benchmarks
+    
+The benchmark crate now contains a new benchmark based on TPC-H that can 
execute TPC-H query 1 against CSV, Parquet, 
+and memory data sources. This is useful for running benchmarks against larger 
data sets.
+
+# Integration Testing / IPC
+
+Arrow IPC is the format for serialization and interprocess communication. It 
is described in arrow.apache.org, and is 

Review comment:
       ```suggestion
   Arrow IPC is the format for serialization and interprocess communication. It 
is described in arrow.apache.org and is 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 
+very popular API - iterators - to arrow arrays. Work for other types will 
continue throughout 3.0.0.
+
+## Improved Variable-sized Arrays
+
+- Variable sized arrays (e.g. array of strings) have been internally 
refactored to more easily support their larger (64 
+bit size offset) version. This allowed us to generalize some of the kernels to 
both (32 and 64) versions, and also 
+perform type checks when building them.
+
+## Kernels
+
+There have been numerous improvements in the Arrow compute kernels, including:
+
+- New kernels have been added for string operations, including substring, min, 
max, concat, and length.
+- Aggregate sum is now implemented for SIMD with a 5x improvement over the 
non-SIMD operation
+- Many kernels have been improved to support dictionary-encoded arrays
+- Some kernels were optimized for arrays without nulls, making them 
significantly faster in that case.
+- Many kernels were optimized in the number of memory copies that are needed 
to apply them, and also on their 
+implementation.
+
+## Other Improvements
+
+The Array trait now has `get_buffer_memory_size` and `get_array_memory_size` 
methods for determining the amount of 
+memory allocated for the array.
+
+# Parquet
+
+Significant effort is underway to create a Parquet writer for Arrow data. This 
work has not been released as part of 

Review comment:
       ```suggestion
   A significant effort is underway to create a Parquet writer for Arrow data. 
This work has not been released as part of 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 
+very popular API - iterators - to arrow arrays. Work for other types will 
continue throughout 3.0.0.
+
+## Improved Variable-sized Arrays
+
+- Variable sized arrays (e.g. array of strings) have been internally 
refactored to more easily support their larger (64 
+bit size offset) version. This allowed us to generalize some of the kernels to 
both (32 and 64) versions, and also 
+perform type checks when building them.
+
+## Kernels
+
+There have been numerous improvements in the Arrow compute kernels, including:
+
+- New kernels have been added for string operations, including substring, min, 
max, concat, and length.
+- Aggregate sum is now implemented for SIMD with a 5x improvement over the 
non-SIMD operation
+- Many kernels have been improved to support dictionary-encoded arrays
+- Some kernels were optimized for arrays without nulls, making them 
significantly faster in that case.
+- Many kernels were optimized in the number of memory copies that are needed 
to apply them, and also on their 
+implementation.
+
+## Other Improvements
+
+The Array trait now has `get_buffer_memory_size` and `get_array_memory_size` 
methods for determining the amount of 
+memory allocated for the array.
+
+# Parquet
+
+Significant effort is underway to create a Parquet writer for Arrow data. This 
work has not been released as part of 
+2.0.0, and is planned for the 3.0.0 release. The development of this writer is 
being carried out on the 
+[rust-parquet-arrow-writer][2] branch, and the branch is regularly 
synchronized with the main branch.
+As part of the writer, the necessary improvements and features are being added 
to the reader.
+
+The main focus areas are:
+- Supporting nested Arrow types, such as `List<Struct<[Dictionary, String]>>`
+- Ensuring correct round-trip between the reader and writer by encoding Arrow 
schemas in the Parquet metadata
+- Improve null value writing for Parquet
+
+A new `parquet_derive` crate has been created, which allows users to derive 
Parquet records for simple structs. Refer to 
+the [parquet_derive crate][3] for usage examples.
+
+# DataFusion
+
+DataFusion is an in-memory query engine with DataFrame and SQL APIs, built on 
top of base Arrow support.
+
+## DataFrame API
+
+DataFusion now has a richer [DataFrame API][4] with improved documentation 
showing example usage, 
+supporting the following operations:
+
+- select_columns
+- select
+- filter
+- aggregate
+- limit
+- sort
+- collect
+- explain
+
+## Performance & Scalability
+
+DataFusion query execution now uses `async`/`await` with the tokio threaded 
runtime rather than launching dedicated 
+threads, making queries scale much better across available cores.
+
+The hash aggregate physical operator has been largely re-written, resulting in 
significant performance improvements.
+
+## Expressions and Compute
+
+### Improved Scalar Functions
+
+DataFusion has many new functions, both in the SQL and the DataFrame API:
+- Length of an string
+- COUNT(DISTINCT column)
+- to_timestamp
+- IsNull and IsNotNull
+- Min/Max for strings (lexicographic order)
+- Array of columns
+- Concatenation of strings
+- Aliases of aggregate expressions
+
+Many existing expressions were also significantly optimized (2-3x speedups) by 
avoiding memory copies and leveraging 
+Arrow format’s invariants.
+
+Unary mathematical functions (such as sqrt) now support both 32 and 64 bit 
floats, and return the corresponding type, 
+thereby allowing faster operations when higher precision is not needed.
+
+### Improved User-defined Functions (UDFs)
+The API to use and register UDFs has been significantly improved, allowing 
users to register UDFs and call them both 
+via SQL and the DataFrame API. UDFs now also have the same generality as 
DataFusion’s own functions, including variadic 
+and dynamically typed arguments.
+
+### User-defined Aggregate Functions (UDAFs)
+DataFusion now supports user-defined aggregate functions that can be used to 
perform operations than span multiple 
+rows, batches and partitions. UDAFs have the same generality of DataFusion’s 
own functions and support both row updates 
+and batch updates. You can check out [this example][5] to learn how to declare 
and use a UDAF.
+
+### User-defined Constants
+DataFusion now supports registering constants (e.g. “@version”), that live for 
the duration of the execution context 

Review comment:
       ```suggestion
   DataFusion now supports registering constants (e.g. “@version”) that live 
for the duration of the execution context 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 
+very popular API - iterators - to arrow arrays. Work for other types will 
continue throughout 3.0.0.
+
+## Improved Variable-sized Arrays
+
+- Variable sized arrays (e.g. array of strings) have been internally 
refactored to more easily support their larger (64 
+bit size offset) version. This allowed us to generalize some of the kernels to 
both (32 and 64) versions, and also 

Review comment:
       ```suggestion
   size offset) version. This allowed us to generalize some of the kernels to 
both (32 and 64) versions and 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 
+very popular API - iterators - to arrow arrays. Work for other types will 
continue throughout 3.0.0.
+
+## Improved Variable-sized Arrays
+
+- Variable sized arrays (e.g. array of strings) have been internally 
refactored to more easily support their larger (64 
+bit size offset) version. This allowed us to generalize some of the kernels to 
both (32 and 64) versions, and also 
+perform type checks when building them.
+
+## Kernels
+
+There have been numerous improvements in the Arrow compute kernels, including:
+
+- New kernels have been added for string operations, including substring, min, 
max, concat, and length.
+- Aggregate sum is now implemented for SIMD with a 5x improvement over the 
non-SIMD operation
+- Many kernels have been improved to support dictionary-encoded arrays
+- Some kernels were optimized for arrays without nulls, making them 
significantly faster in that case.
+- Many kernels were optimized in the number of memory copies that are needed 
to apply them, and also on their 
+implementation.
+
+## Other Improvements
+
+The Array trait now has `get_buffer_memory_size` and `get_array_memory_size` 
methods for determining the amount of 
+memory allocated for the array.
+
+# Parquet
+
+Significant effort is underway to create a Parquet writer for Arrow data. This 
work has not been released as part of 
+2.0.0, and is planned for the 3.0.0 release. The development of this writer is 
being carried out on the 
+[rust-parquet-arrow-writer][2] branch, and the branch is regularly 
synchronized with the main branch.
+As part of the writer, the necessary improvements and features are being added 
to the reader.
+
+The main focus areas are:
+- Supporting nested Arrow types, such as `List<Struct<[Dictionary, String]>>`
+- Ensuring correct round-trip between the reader and writer by encoding Arrow 
schemas in the Parquet metadata
+- Improve null value writing for Parquet
+
+A new `parquet_derive` crate has been created, which allows users to derive 
Parquet records for simple structs. Refer to 
+the [parquet_derive crate][3] for usage examples.
+
+# DataFusion
+
+DataFusion is an in-memory query engine with DataFrame and SQL APIs, built on 
top of base Arrow support.
+
+## DataFrame API
+
+DataFusion now has a richer [DataFrame API][4] with improved documentation 
showing example usage, 
+supporting the following operations:
+
+- select_columns
+- select
+- filter
+- aggregate
+- limit
+- sort
+- collect
+- explain
+
+## Performance & Scalability
+
+DataFusion query execution now uses `async`/`await` with the tokio threaded 
runtime rather than launching dedicated 
+threads, making queries scale much better across available cores.
+
+The hash aggregate physical operator has been largely re-written, resulting in 
significant performance improvements.
+
+## Expressions and Compute
+
+### Improved Scalar Functions
+
+DataFusion has many new functions, both in the SQL and the DataFrame API:
+- Length of an string
+- COUNT(DISTINCT column)
+- to_timestamp
+- IsNull and IsNotNull
+- Min/Max for strings (lexicographic order)
+- Array of columns
+- Concatenation of strings
+- Aliases of aggregate expressions
+
+Many existing expressions were also significantly optimized (2-3x speedups) by 
avoiding memory copies and leveraging 
+Arrow format’s invariants.
+
+Unary mathematical functions (such as sqrt) now support both 32 and 64 bit 
floats, and return the corresponding type, 
+thereby allowing faster operations when higher precision is not needed.
+
+### Improved User-defined Functions (UDFs)
+The API to use and register UDFs has been significantly improved, allowing 
users to register UDFs and call them both 
+via SQL and the DataFrame API. UDFs now also have the same generality as 
DataFusion’s own functions, including variadic 
+and dynamically typed arguments.
+
+### User-defined Aggregate Functions (UDAFs)
+DataFusion now supports user-defined aggregate functions that can be used to 
perform operations than span multiple 
+rows, batches and partitions. UDAFs have the same generality of DataFusion’s 
own functions and support both row updates 
+and batch updates. You can check out [this example][5] to learn how to declare 
and use a UDAF.
+
+### User-defined Constants
+DataFusion now supports registering constants (e.g. “@version”), that live for 
the duration of the execution context 
+and can be accessed from SQL.
+
+## Query Planning & Optimization
+
+### User-defined logical plans
+
+The Logical Plan enum is now extensible through an Extension variant which 
accepts a UserDefinedLogicalPlan trait using 
+dynamic dispatch. Consequently, DataFusion now supports user-defined logical 
nodes, thereby allowing complex nodes to 
+be planned and executed. You can check this example to learn how to declare a 
new node.
+
+### Predicate push-down
+
+DataFusion now has a Predicate push-down optimizer rule that pushes filter 
operations as close as possible to scans, 
+thereby speeding up the physical execution of suboptimal queries created via 
the DataFrame API.
+SQL
+
+DataFusion now uses a more recent release of the sqlparser crate, which has 
much more comprehensive support for SQL 
+syntax and also supports multiple dialects (Postgres, MS SQL, and MySQL).
+
+It is now possible to see the query plan for a SQL statement using EXPLAIN 
syntax.
+
+# Benchmarks
+    
+The benchmark crate now contains a new benchmark based on TPC-H that can 
execute TPC-H query 1 against CSV, Parquet, 
+and memory data sources. This is useful for running benchmarks against larger 
data sets.
+
+# Integration Testing / IPC
+
+Arrow IPC is the format for serialization and interprocess communication. It 
is described in arrow.apache.org, and is 
+the format used for file and stream I/O between applications wishing to 
interchange Arrow data.
+
+The Arrow project released IPC version 5 of the Arrow IPC format in version 
1.0.0. Before that, a message padding 
+change was made in version 0.15.0 to change the default padding to 8 bytes, 
while remaining in IPC version 4. Arrow 
+release 0.14.1 and earlier were the last releases to use the legacy 4 byte 
alignment.
+As part of 2.0.0, the Rust implementation was updated to comply with the 
changes up to release 0.15.0 of Arrow. 
+Work on supporting IPC version 5 is underway, and is expected to be completed 
in time for 3.0.0.
+
+As part of the conformance work, Rust is being added to the Arrow integration 
suite, which tests that supported 
+language implementations (ARROW-3690):
+
+- Comply with the Arrow IPC format
+- Can read and write each other’s generated data
+- IPC version 4 is being verified through the above work.
+
+# Roadmap for 3.0.0 and Beyond
+
+Here are some of the initiatives that contributors are currently working on 
for future releases:
+
+- Support stable Rust
+- Improved DictionaryArray support and performance
+- Implement inner equijoins
+- Support for various platforms like ARMv8
+- Supporting the C Data Interface from Rust, to better support 
interoperability with other Arrow implementations
+
+# How to Get Involved
+
+If you are interested in contributing to the Rust subproject in Apache Arrow 
you can find a list of open issues 
+suitable for beginners here and the full list here.
+
+Other ways to get involved include trying out Arrow on some of your own data 
and filing bug reports and helping to 

Review comment:
       ```suggestion
   Other ways to get involved include trying out Arrow on some of your data and 
filing bug reports, and helping to 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 
+very popular API - iterators - to arrow arrays. Work for other types will 
continue throughout 3.0.0.
+
+## Improved Variable-sized Arrays
+
+- Variable sized arrays (e.g. array of strings) have been internally 
refactored to more easily support their larger (64 
+bit size offset) version. This allowed us to generalize some of the kernels to 
both (32 and 64) versions, and also 
+perform type checks when building them.
+
+## Kernels
+
+There have been numerous improvements in the Arrow compute kernels, including:
+
+- New kernels have been added for string operations, including substring, min, 
max, concat, and length.
+- Aggregate sum is now implemented for SIMD with a 5x improvement over the 
non-SIMD operation
+- Many kernels have been improved to support dictionary-encoded arrays
+- Some kernels were optimized for arrays without nulls, making them 
significantly faster in that case.
+- Many kernels were optimized in the number of memory copies that are needed 
to apply them, and also on their 
+implementation.
+
+## Other Improvements
+
+The Array trait now has `get_buffer_memory_size` and `get_array_memory_size` 
methods for determining the amount of 
+memory allocated for the array.
+
+# Parquet
+
+Significant effort is underway to create a Parquet writer for Arrow data. This 
work has not been released as part of 
+2.0.0, and is planned for the 3.0.0 release. The development of this writer is 
being carried out on the 
+[rust-parquet-arrow-writer][2] branch, and the branch is regularly 
synchronized with the main branch.
+As part of the writer, the necessary improvements and features are being added 
to the reader.
+
+The main focus areas are:
+- Supporting nested Arrow types, such as `List<Struct<[Dictionary, String]>>`
+- Ensuring correct round-trip between the reader and writer by encoding Arrow 
schemas in the Parquet metadata
+- Improve null value writing for Parquet
+
+A new `parquet_derive` crate has been created, which allows users to derive 
Parquet records for simple structs. Refer to 
+the [parquet_derive crate][3] for usage examples.
+
+# DataFusion
+
+DataFusion is an in-memory query engine with DataFrame and SQL APIs, built on 
top of base Arrow support.
+
+## DataFrame API
+
+DataFusion now has a richer [DataFrame API][4] with improved documentation 
showing example usage, 
+supporting the following operations:
+
+- select_columns
+- select
+- filter
+- aggregate
+- limit
+- sort
+- collect
+- explain
+
+## Performance & Scalability
+
+DataFusion query execution now uses `async`/`await` with the tokio threaded 
runtime rather than launching dedicated 
+threads, making queries scale much better across available cores.
+
+The hash aggregate physical operator has been largely re-written, resulting in 
significant performance improvements.
+
+## Expressions and Compute
+
+### Improved Scalar Functions
+
+DataFusion has many new functions, both in the SQL and the DataFrame API:
+- Length of an string
+- COUNT(DISTINCT column)
+- to_timestamp
+- IsNull and IsNotNull
+- Min/Max for strings (lexicographic order)
+- Array of columns
+- Concatenation of strings
+- Aliases of aggregate expressions
+
+Many existing expressions were also significantly optimized (2-3x speedups) by 
avoiding memory copies and leveraging 
+Arrow format’s invariants.
+
+Unary mathematical functions (such as sqrt) now support both 32 and 64 bit 
floats, and return the corresponding type, 
+thereby allowing faster operations when higher precision is not needed.
+
+### Improved User-defined Functions (UDFs)
+The API to use and register UDFs has been significantly improved, allowing 
users to register UDFs and call them both 
+via SQL and the DataFrame API. UDFs now also have the same generality as 
DataFusion’s own functions, including variadic 
+and dynamically typed arguments.
+
+### User-defined Aggregate Functions (UDAFs)
+DataFusion now supports user-defined aggregate functions that can be used to 
perform operations than span multiple 
+rows, batches and partitions. UDAFs have the same generality of DataFusion’s 
own functions and support both row updates 
+and batch updates. You can check out [this example][5] to learn how to declare 
and use a UDAF.
+
+### User-defined Constants
+DataFusion now supports registering constants (e.g. “@version”), that live for 
the duration of the execution context 
+and can be accessed from SQL.
+
+## Query Planning & Optimization
+
+### User-defined logical plans
+
+The Logical Plan enum is now extensible through an Extension variant which 
accepts a UserDefinedLogicalPlan trait using 
+dynamic dispatch. Consequently, DataFusion now supports user-defined logical 
nodes, thereby allowing complex nodes to 
+be planned and executed. You can check this example to learn how to declare a 
new node.
+
+### Predicate push-down
+
+DataFusion now has a Predicate push-down optimizer rule that pushes filter 
operations as close as possible to scans, 
+thereby speeding up the physical execution of suboptimal queries created via 
the DataFrame API.
+SQL
+
+DataFusion now uses a more recent release of the sqlparser crate, which has 
much more comprehensive support for SQL 
+syntax and also supports multiple dialects (Postgres, MS SQL, and MySQL).
+
+It is now possible to see the query plan for a SQL statement using EXPLAIN 
syntax.
+
+# Benchmarks
+    
+The benchmark crate now contains a new benchmark based on TPC-H that can 
execute TPC-H query 1 against CSV, Parquet, 
+and memory data sources. This is useful for running benchmarks against larger 
data sets.
+
+# Integration Testing / IPC
+
+Arrow IPC is the format for serialization and interprocess communication. It 
is described in arrow.apache.org, and is 
+the format used for file and stream I/O between applications wishing to 
interchange Arrow data.
+
+The Arrow project released IPC version 5 of the Arrow IPC format in version 
1.0.0. Before that, a message padding 
+change was made in version 0.15.0 to change the default padding to 8 bytes, 
while remaining in IPC version 4. Arrow 
+release 0.14.1 and earlier were the last releases to use the legacy 4 byte 
alignment.
+As part of 2.0.0, the Rust implementation was updated to comply with the 
changes up to release 0.15.0 of Arrow. 
+Work on supporting IPC version 5 is underway, and is expected to be completed 
in time for 3.0.0.
+
+As part of the conformance work, Rust is being added to the Arrow integration 
suite, which tests that supported 
+language implementations (ARROW-3690):
+
+- Comply with the Arrow IPC format
+- Can read and write each other’s generated data
+- IPC version 4 is being verified through the above work.
+
+# Roadmap for 3.0.0 and Beyond
+
+Here are some of the initiatives that contributors are currently working on 
for future releases:
+
+- Support stable Rust
+- Improved DictionaryArray support and performance
+- Implement inner equijoins
+- Support for various platforms like ARMv8
+- Supporting the C Data Interface from Rust, to better support 
interoperability with other Arrow implementations
+
+# How to Get Involved
+
+If you are interested in contributing to the Rust subproject in Apache Arrow 
you can find a list of open issues 

Review comment:
       ```suggestion
   If you are interested in contributing to the Rust subproject in Apache 
Arrow, you can find a list of open issues 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 
+very popular API - iterators - to arrow arrays. Work for other types will 
continue throughout 3.0.0.
+
+## Improved Variable-sized Arrays
+
+- Variable sized arrays (e.g. array of strings) have been internally 
refactored to more easily support their larger (64 
+bit size offset) version. This allowed us to generalize some of the kernels to 
both (32 and 64) versions, and also 
+perform type checks when building them.
+
+## Kernels
+
+There have been numerous improvements in the Arrow compute kernels, including:
+
+- New kernels have been added for string operations, including substring, min, 
max, concat, and length.
+- Aggregate sum is now implemented for SIMD with a 5x improvement over the 
non-SIMD operation
+- Many kernels have been improved to support dictionary-encoded arrays
+- Some kernels were optimized for arrays without nulls, making them 
significantly faster in that case.
+- Many kernels were optimized in the number of memory copies that are needed 
to apply them, and also on their 
+implementation.
+
+## Other Improvements
+
+The Array trait now has `get_buffer_memory_size` and `get_array_memory_size` 
methods for determining the amount of 
+memory allocated for the array.
+
+# Parquet
+
+Significant effort is underway to create a Parquet writer for Arrow data. This 
work has not been released as part of 
+2.0.0, and is planned for the 3.0.0 release. The development of this writer is 
being carried out on the 
+[rust-parquet-arrow-writer][2] branch, and the branch is regularly 
synchronized with the main branch.
+As part of the writer, the necessary improvements and features are being added 
to the reader.
+
+The main focus areas are:
+- Supporting nested Arrow types, such as `List<Struct<[Dictionary, String]>>`
+- Ensuring correct round-trip between the reader and writer by encoding Arrow 
schemas in the Parquet metadata
+- Improve null value writing for Parquet
+
+A new `parquet_derive` crate has been created, which allows users to derive 
Parquet records for simple structs. Refer to 
+the [parquet_derive crate][3] for usage examples.
+
+# DataFusion
+
+DataFusion is an in-memory query engine with DataFrame and SQL APIs, built on 
top of base Arrow support.
+
+## DataFrame API
+
+DataFusion now has a richer [DataFrame API][4] with improved documentation 
showing example usage, 
+supporting the following operations:
+
+- select_columns
+- select
+- filter
+- aggregate
+- limit
+- sort
+- collect
+- explain
+
+## Performance & Scalability
+
+DataFusion query execution now uses `async`/`await` with the tokio threaded 
runtime rather than launching dedicated 
+threads, making queries scale much better across available cores.
+
+The hash aggregate physical operator has been largely re-written, resulting in 
significant performance improvements.
+
+## Expressions and Compute
+
+### Improved Scalar Functions
+
+DataFusion has many new functions, both in the SQL and the DataFrame API:
+- Length of an string
+- COUNT(DISTINCT column)
+- to_timestamp
+- IsNull and IsNotNull
+- Min/Max for strings (lexicographic order)
+- Array of columns
+- Concatenation of strings
+- Aliases of aggregate expressions
+
+Many existing expressions were also significantly optimized (2-3x speedups) by 
avoiding memory copies and leveraging 
+Arrow format’s invariants.
+
+Unary mathematical functions (such as sqrt) now support both 32 and 64 bit 
floats, and return the corresponding type, 

Review comment:
       ```suggestion
   Unary mathematical functions (such as sqrt) now support both 32 and 64-bit 
floats and return the corresponding type, 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 
+very popular API - iterators - to arrow arrays. Work for other types will 
continue throughout 3.0.0.
+
+## Improved Variable-sized Arrays
+
+- Variable sized arrays (e.g. array of strings) have been internally 
refactored to more easily support their larger (64 
+bit size offset) version. This allowed us to generalize some of the kernels to 
both (32 and 64) versions, and also 
+perform type checks when building them.
+
+## Kernels
+
+There have been numerous improvements in the Arrow compute kernels, including:
+
+- New kernels have been added for string operations, including substring, min, 
max, concat, and length.
+- Aggregate sum is now implemented for SIMD with a 5x improvement over the 
non-SIMD operation
+- Many kernels have been improved to support dictionary-encoded arrays
+- Some kernels were optimized for arrays without nulls, making them 
significantly faster in that case.
+- Many kernels were optimized in the number of memory copies that are needed 
to apply them, and also on their 

Review comment:
       ```suggestion
   - Many kernels were optimized in the number of memory copies that are needed 
to apply them and also on their 
   ```

##########
File path: _posts/2020-10-19-rust-2.0.0-release.md
##########
@@ -0,0 +1,203 @@
+---
+layout: post
+title: "Apache Arrow 2.0.0 (Rust)"
+date: "2020-10-19 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+Apache Arrow 2.0.0 is a significant release for the Apache Arrow project in 
general, and the Rust subproject in 
+particular with almost 200 issues resolved by 15 contributors. In this blog 
post we will go through the main changes 
+affecting core Arrow, Parquet support and DataFusion query engine. The full 
list of resolved issues can be found 
+[here][1].
+
+While the Java and C/C++ (used by Python and R) Arrow implementations likely 
remain the most feature rich, with the 
+2.0.0 release, the Rust implementation is closing the feature gap quickly. 
Here are some of the highlights for this 
+release.
+
+# Core Arrow Crate
+
+## Iterator Trait
+
+- Primitive arrays (e.g. array of integers) can now be converted to, and 
initialized from, an iterator. This exposes a 
+very popular API - iterators - to arrow arrays. Work for other types will 
continue throughout 3.0.0.
+
+## Improved Variable-sized Arrays
+
+- Variable sized arrays (e.g. array of strings) have been internally 
refactored to more easily support their larger (64 
+bit size offset) version. This allowed us to generalize some of the kernels to 
both (32 and 64) versions, and also 
+perform type checks when building them.
+
+## Kernels
+
+There have been numerous improvements in the Arrow compute kernels, including:
+
+- New kernels have been added for string operations, including substring, min, 
max, concat, and length.
+- Aggregate sum is now implemented for SIMD with a 5x improvement over the 
non-SIMD operation
+- Many kernels have been improved to support dictionary-encoded arrays
+- Some kernels were optimized for arrays without nulls, making them 
significantly faster in that case.
+- Many kernels were optimized in the number of memory copies that are needed 
to apply them, and also on their 
+implementation.
+
+## Other Improvements
+
+The Array trait now has `get_buffer_memory_size` and `get_array_memory_size` 
methods for determining the amount of 
+memory allocated for the array.
+
+# Parquet
+
+Significant effort is underway to create a Parquet writer for Arrow data. This 
work has not been released as part of 
+2.0.0, and is planned for the 3.0.0 release. The development of this writer is 
being carried out on the 
+[rust-parquet-arrow-writer][2] branch, and the branch is regularly 
synchronized with the main branch.
+As part of the writer, the necessary improvements and features are being added 
to the reader.
+
+The main focus areas are:
+- Supporting nested Arrow types, such as `List<Struct<[Dictionary, String]>>`
+- Ensuring correct round-trip between the reader and writer by encoding Arrow 
schemas in the Parquet metadata
+- Improve null value writing for Parquet
+
+A new `parquet_derive` crate has been created, which allows users to derive 
Parquet records for simple structs. Refer to 
+the [parquet_derive crate][3] for usage examples.
+
+# DataFusion
+
+DataFusion is an in-memory query engine with DataFrame and SQL APIs, built on 
top of base Arrow support.
+
+## DataFrame API
+
+DataFusion now has a richer [DataFrame API][4] with improved documentation 
showing example usage, 
+supporting the following operations:
+
+- select_columns
+- select
+- filter
+- aggregate
+- limit
+- sort
+- collect
+- explain
+
+## Performance & Scalability
+
+DataFusion query execution now uses `async`/`await` with the tokio threaded 
runtime rather than launching dedicated 
+threads, making queries scale much better across available cores.
+
+The hash aggregate physical operator has been largely re-written, resulting in 
significant performance improvements.
+
+## Expressions and Compute
+
+### Improved Scalar Functions
+
+DataFusion has many new functions, both in the SQL and the DataFrame API:
+- Length of an string
+- COUNT(DISTINCT column)
+- to_timestamp
+- IsNull and IsNotNull
+- Min/Max for strings (lexicographic order)
+- Array of columns
+- Concatenation of strings
+- Aliases of aggregate expressions
+
+Many existing expressions were also significantly optimized (2-3x speedups) by 
avoiding memory copies and leveraging 
+Arrow format’s invariants.
+
+Unary mathematical functions (such as sqrt) now support both 32 and 64 bit 
floats, and return the corresponding type, 
+thereby allowing faster operations when higher precision is not needed.
+
+### Improved User-defined Functions (UDFs)
+The API to use and register UDFs has been significantly improved, allowing 
users to register UDFs and call them both 
+via SQL and the DataFrame API. UDFs now also have the same generality as 
DataFusion’s own functions, including variadic 

Review comment:
       ```suggestion
   via SQL and the DataFrame API. UDFs now also have the same generality as 
DataFusion’s functions, including variadic 
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-site] vertexclique commented on a change in pull request #81: [Website] Rust release notes / blog post

Reply via email to