tustvold commented on code in PR #264:
URL: https://github.com/apache/arrow-site/pull/264#discussion_r1010901509
##########
_posts/2022-10-30-multi-column-sorts-in-arrow-rust-part-1.md:
##########
@@ -0,0 +1,232 @@
+---
+layout: post
+title: "Fast and Memory Efficient Multi-Column Sorts in Apache Arrow Rust,
Part 1"
+date: "2022-10-30 00:00:00"
+author: "tustvold and alamb"
+categories: [arrow]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+## Introduction
+
+Sorting is one of the most fundamental operations in modern databases and
other analytic systems, underpinning important operators such as aggregates,
joins, window functions, merge, and more. By some estimates, more than half of
the execution time in data processing systems is spent sorting. Optimizing
sorts is therefore vital to improving query performance and overall system
efficiency.
+
+Sorting is also one of the most well studied topics in computer science. The
classic survey paper for databases is [Implementing Sorting in Database
Systems](https://dl.acm.org/doi/10.1145/1132960.1132964) by Goetz Graefe which
provides a thorough academic treatment and is still very applicable today.
However, it may not be obvious how to apply the wisdom and advanced techniques
described in that paper to modern systems. In addition, the excellent [DuckDB
blog on sorting](https://duckdb.org/2021/08/27/external-sorting.html)
highlights many sorting techniques, and mentions a comparable row format, but
it does not explain how to efficiently sort variable length strings or
dictionary encoded data.
+
+In this blog post we explain in detail the new [row
format](https://docs.rs/arrow/25.0.0/arrow/row/index.html) in the [Rust
implementation](https://github.com/apache/arrow-rs) of [Apache
Arrow](https://arrow.apache.org/), and how we used to make sorting more than
[3x](https://github.com/apache/arrow-rs/pull/2929) faster than an alternate
comparator based approach. The benefits are especially pronounced for strings,
dictionary encoded data, and sorts with large numbers of columns.
Review Comment:
```suggestion
In this series we explain in detail the new [row
format](https://docs.rs/arrow/25.0.0/arrow/row/index.html) in the [Rust
implementation](https://github.com/apache/arrow-rs) of [Apache
Arrow](https://arrow.apache.org/), and how we used to make sorting more than
[3x](https://github.com/apache/arrow-rs/pull/2929) faster than an alternate
comparator based approach. The benefits are especially pronounced for strings,
dictionary encoded data, and sorts with large numbers of columns.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]