This is an automated email from the ASF dual-hosted git repository.
raulcd pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-site.git
The following commit(s) were added to refs/heads/main by this push:
new 6deffca8d3b Website: Add blog post for 18.0.0 (#547)
6deffca8d3b is described below
commit 6deffca8d3bcfa93a336e1bd9ccb96dba4dc7372
Author: Raúl Cumplido <[email protected]>
AuthorDate: Tue Nov 5 12:58:34 2024 +0100
Website: Add blog post for 18.0.0 (#547)
Release blog post for 18.0.0.
Issues on the milestone are here:
https://github.com/apache/arrow/milestone/64
---------
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: David Li <[email protected]>
Co-authored-by: Rossi Sun <[email protected]>
Co-authored-by: Curt Hagenlocher <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Bryce Mecum <[email protected]>
---
_posts/2024-10-28-18.0.0-release.md | 238 ++++++++++++++++++++++++++++++++++++
1 file changed, 238 insertions(+)
diff --git a/_posts/2024-10-28-18.0.0-release.md
b/_posts/2024-10-28-18.0.0-release.md
new file mode 100644
index 00000000000..943c772eb14
--- /dev/null
+++ b/_posts/2024-10-28-18.0.0-release.md
@@ -0,0 +1,238 @@
+---
+layout: post
+title: "Apache Arrow 18.0.0 Release"
+date: "2024-10-28 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+The Apache Arrow team is pleased to announce the 18.0.0 release. This covers
+over 3 months of development work and includes [**334 resolved issues**][1]
+on [**530 distinct commits**][2] from [**89 distinct contributors**][2].
+See the [Install Page](https://arrow.apache.org/install/)
+to learn how to get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Community
+
+Since the 17.0.0 release, Will Ayd and Rossi Sun have been invited to be
committer.
+No new members have joined the Project Management Committee (PMC).
+
+Thanks for your contributions and participation in the project!
+
+## Columnar format
+
+The Arrow columnar format now allows 32-bit and 64-bit decimal data, in
+addition to the already existing 128-bit and 256-bit decimal data types
+(GH-43956).
+
+
+## Arrow Flight RPC notes
+
+The Java implementation now transparently handles compressed Arrow data when
reading, instead of requiring explicit configuration. (GH-43469)
+The Ruby bindings now support implementing DoPut on the server. (GH-43814)
+
+## C++ notes
+
+The default memory pool has changed to mimalloc on all platforms (GH-43254).
+Previously, jemalloc was used by default on Linux. Using mimalloc by default
+provides a more consistent experience across different platforms, and
+makes configuration easier. It is expected that this might either increase
+or decrease performance on user workloads that use the default memory pool;
+please benchmark accordingly. Jemalloc can still be selected by setting
+the
[`ARROW_DEFAULT_MEMORY_POOL`](https://arrow.apache.org/docs/cpp/env_vars.html#envvar-ARROW_DEFAULT_MEMORY_POOL)
environment variable to "jemalloc".
+
+A new class `arrow::ArrayStatistics` has been added to encode basic statistics
+about an Arrow array. It provides a source-agnostic representation for
statistics
+provided by third-party sources such as Parquet files (GH-41909).
+
+The new Decimal32 and Decimal64 types have been made available (GH-43956).
+
+Several canonical extension types have been implemented:
+- the
[Opaque](https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#opaque)
extension type (GH-43454);
+- the [8-bit
boolean](https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#bit-boolean)
extension type (GH-17682);
+- the
[UUID](https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#uuid)
extension type (GH-15058);
+- the
[JSON](https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#json)
extension type (GH-32538).
+
+**Flight UCX is deprecated.** We plan to remove this experiment in the next
couple of releases.
+
+### Acero
+
+- Enhanced the row-oriented representation by widening the offset type from
32-bit to 64-bit, resolving crashes and data corruption in aggregation and hash
join on large datasets due to offset overflow (GH-43495).
+- Improved ordered aggregation performance by reducing complexity from
`O(n*m)` to `O(n)`, where `n` is the number of rows and `m` the number of
segments in the batch (GH-44052).
+
+### Compute
+
+Casting between string-like and string-view-like types has been implemented
(GH-42247).
+
+### Dataset
+
+
+### Filesystems
+
+Writing small files to S3 can use a single S3 API call instead of three,
+provided the new option `allow_delayed_open` is enabled (GH-40557).
+Files larger than 5 MB still go through the regular multipart
+upload mechanism.
+
+Background writes are now implemented and enabled by default for the Azure
+filesystem, dramatically improving the performance of writing to remote files
+(GH-40036).
+
+Finalization of the S3 filesystem layer should hopefully be more robust
(GH-44071).
+
+### Gandiva
+
+LLVM 19.1 is now supported (GH-44222).
+
+### GPU
+
+
+### IPC
+
+The seed corpus used for fuzzing the IPC reader has been improved, hopefully
+helping make the IPC reader even more robust against corrupt or malicious
+IPC streams (GH-38041).
+
+### Parquet
+
+A new command line utility `parquet-dump-footer` allows dumping the
Thrift-encoded
+footer metadata of a Parquet file, optionally scrubbing confidential data
+(GH-42102). This is part of the effort to collect real-world Parquet metadata
+so as to evaluate the efficiency of future improvements to the Parquet format.
+Please see https://github.com/apache/parquet-benchmark for instructions to
submit
+footers representative of your own workloads.
+
+
+## C# notes
+
+- Partial support has been added for LargeBinary, LargeString and LargeList.
The column sizes cannot exceed 2 GB in length. (GH-43266).
+- Changes to Flight support were made for better control and compatibility,
and to allow Flight Server to be hosted in pre-Kestrel versions of .NET
(GH-43907, GH-43672, GH-41347).
+- Support has been added for newly-defined types decimal32 and decimal64
(GH-44271).
+- The import of sliced arrays through the C Data interface now works
correctly. (GH-43267)
+## Java notes
+
+**Java 8 is no longer supported.** (GH-38051)
+
+**Gandiva may not work in this release.** For details, please see
[GH-43576](https://github.com/apache/arrow/issues/43576).
+
+Basic support for RunEndEncoded was added (GH-39982). The ListView/StringView
vector implementations are now more complete, including C Data support
(multiple issues).
+
+Several APIs have been updated to accept `long` for addresses in preparation
for FFM/large buffer support (GH-43902). We no longer expose `sun.misc.Unsafe`
(GH-43479). We no longer ship the `shaded` flight-core JARs (GH-43217).
+
+More options were added to the Dataset ScanOptions API (GH-28866).
+
+## JavaScript notes
+- Accessing individual rows in Tables or Structs should now be more performant
([GH-30863](https://github.com/apache/arrow/issues/30863)).
+
+## Python notes
+Compatibility notes:
+* NumPy required dependency has been removed from pyarrow packaging
+ [GH-43846](https://github.com/apache/arrow/issues/43846) and has been
+ made an optional runtime dependency
[GH-25118](https://github.com/apache/arrow/issues/25118).
+* Support for Python 3.8 has been dropped
[GH-43518](https://github.com/apache/arrow/issues/43518)
+* No longer used serialize/deserialize Pyarrow C++ functions have been
+ deprecated [GH-44063](https://github.com/apache/arrow/issues/44063).
+* Passing of build flags to setup.py (e.g. `setup.py --with-parquet`) has been
+ deprecated [GH-43514](https://github.com/apache/arrow/issues/43514)
+
+New features:
+* Non-cpu work has continued with
[GH-43973](https://github.com/apache/arrow/issues/43973),
+ [GH-43728](https://github.com/apache/arrow/issues/43728),
[GH-43727](https://github.com/apache/arrow/issues/43727),
+ [GH-43391](https://github.com/apache/arrow/issues/43391),
+ [GH-42222](https://github.com/apache/arrow/issues/42222) and
+ [GH-41665](https://github.com/apache/arrow/issues/41665).
+* Arrow C++ ``arrow::dataset::Partitioning::Format`` method has been exposed in
+ Python [GH-43684](https://github.com/apache/arrow/issues/43684).
+* UUID canonical extension type is now supported in Python
+ [GH-15058](https://github.com/apache/arrow/issues/15058).
+* Opaque canonical extension type has been implemented
+ [GH-43454](https://github.com/apache/arrow/issues/43454).
+* ``StructArray.from_array`` now accepts a type in addition to names or fields
+ [GH-42014](https://github.com/apache/arrow/issues/42014).
+* New attributes have been added to ``StructType`` in order to access all its
fields
+ [GH-30058](https://github.com/apache/arrow/issues/30058).
+
+Other improvements:
+* In order to support free-threaded build of CPython 3.13 additional work has
been made:
+ [GH-44046](https://github.com/apache/arrow/issues/44046),
+ [GH-44355](https://github.com/apache/arrow/issues/44355) and
+ [GH-43964](https://github.com/apache/arrow/issues/43964). Umbrella issue
+ [GH-43536](https://github.com/apache/arrow/issues/43536).
+* PyCapsule interface now has precedence over others in pa.schema(..)
+ [GH-43388](https://github.com/apache/arrow/issues/43388).
+* Usage of deprecated ``pkg_resources`` in setup.py has been replaced with
+ ``numpy.get_include()``
[GH-43532](https://github.com/apache/arrow/issues/43532).
+* Conversion from Arrow to JAX via dlpack as added to the documentation
examples
+ [GH-44229](https://github.com/apache/arrow/issues/44229).
+
+Relevant bug fixes:
+* ``pyarrow.Table.rename_columns`` has been updated and should have accepted
``tuples``,
+ not only ``list`` or ``dict``. This has been fixed
+ [GH-43588](https://github.com/apache/arrow/issues/43588).
+* Python reference handling in UDF implementation has been sanitized
+ [GH-43487](https://github.com/apache/arrow/issues/43487).
+* Files included when building wheels have been cleaned (unnecessary files
removed)
+ [GH-43299](https://github.com/apache/arrow/issues/43299).
+
+## R notes
+
+For more on what’s in the 18.0.0 R package, see the [R changelog][4].
+
+## Ruby and C GLib notes
+
+### Ruby
+
+* Add workaround for install failure due to `re2.pc` on Ubuntu 20.04:
[GH-41396](https://github.com/apache/arrow/issues/41396)
+* Add support for `0` decimal value:
[GH-43877](https://github.com/apache/arrow/issues/43877)
+
+C GLib related improvements are also available in Ruby.
+
+### C GLib
+
+* Add support for Azure file system:
[GH-43738](https://github.com/apache/arrow/issues/43738)
+* FlightRPC: Add support for DoPut:
[GH-41056](https://github.com/apache/arrow/issues/41056)
+* FlightRPC: Add support for timeout:
[GH-44178](https://github.com/apache/arrow/issues/44178)
+* Parquet: Add support for writing a record batch:
[GH-40860](https://github.com/apache/arrow/issues/40860)
+* Add support for pull style IPC stream format decoder:
[GH-40493](https://github.com/apache/arrow/issues/40493)
+
+## Rust notes and Go notes
+
+The Rust and Go projects have moved to separate repositories outside the
+main Arrow monorepo. For notes on the latest release of the Rust
+implementation, see the latest [Arrow Rust changelog][5].
+For notes on the latest release of the Go implementation, see the latest
+[Arrow Go changelog][6]
+
+## Linux packages notes
+
+The Azure file system is now enabled.
+
+[1]: https://github.com/apache/arrow/milestone/64?closed=1
+[2]: {{ site.baseurl }}/release/18.0.0.html#contributors
+[3]: {{ site.baseurl }}/release/18.0.0.html#changelog
+[4]: {{ site.baseurl }}/docs/r/news/
+[5]: https://github.com/apache/arrow-rs/tags
+[6]: https://github.com/apache/arrow-go/tags