This is an automated email from the ASF dual-hosted git repository.
brycemecum pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-site.git
The following commit(s) were added to refs/heads/main by this push:
new 88e172d14d1 Website: Add blog post for 19.0.0 (#580)
88e172d14d1 is described below
commit 88e172d14d1a3ea8a6ef59ef3f327cd515d17c29
Author: Bryce Mecum <[email protected]>
AuthorDate: Thu Jan 23 16:29:56 2025 -0800
Website: Add blog post for 19.0.0 (#580)
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Adam Reeve <[email protected]>
Co-authored-by: David Li <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Matt Topol <[email protected]>
Co-authored-by: Rossi Sun <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
---
_posts/2025-01-16-19.0.0-release.md | 239 ++++++++++++++++++++++++++++++++++++
1 file changed, 239 insertions(+)
diff --git a/_posts/2025-01-16-19.0.0-release.md
b/_posts/2025-01-16-19.0.0-release.md
new file mode 100644
index 00000000000..c6eee12ae8f
--- /dev/null
+++ b/_posts/2025-01-16-19.0.0-release.md
@@ -0,0 +1,239 @@
+---
+layout: post
+title: "Apache Arrow 19.0.0 Release"
+date: "2025-01-16 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 19.0.0 release. This release
+covers over 2 months of development work and includes [**202 resolved
+issues**][1] on [**330 distinct commits**][2] from [**67 distinct
+contributors**][2]. See the [Install Page](https://arrow.apache.org/install/)
to
+learn how to get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Community
+
+Since the 18.1.0 release, Adam Reeve and Laurent Goujon have been invited to
+become committers. Gang Wu has been invited to join the Project Management
+Committee (PMC).
+
+Thanks for your contributions and participation in the project!
+
+## Release Highlights
+
+A [bug](https://github.com/apache/arrow/issues/45283) has been identified in
the
+19.0.0 versions of the C++ and Python libraries which prevents reading Parquet
+files written by Arrow Rust v53.0.0 or higher. The files written by Arrow Rust
+are correct and the bug was in the patch adding support for Parquet's
+[SizeStatistics](https://github.com/apache/parquet-format/pull/197) feature to
+Arrow C++ and Python. See
[#45293](https://github.com/apache/arrow/issues/45283)
+for more details including a potential workaround.
+
+As a result, we plan to create a 19.0.1 release to include a fix for this which
+should be available in next few weeks.
+
+## Columnar Format
+
+We've added a new experimental specification for representing statistics on
+Arrow Arrays as Arrow Arrays. This is useful for preserving and exchanging
+statistics between systems such as when converting Parquet data to Arrow. See
+[the statistics schema
+documentation](https://arrow.apache.org/docs/format/StatisticsSchema.html) for
+details.
+
+We've expanded the Arrow C Device Data Interface to include an experimental
+Async Device Stream Interface. While the existing Arrow C Device Data Interface
+is a pull-oriented API, the Async interface provides a push-oriented design for
+other workflows. See the
+[documentation](https://arrow.apache.org/docs/format/CDeviceDataInterface.html#async-device-stream-interface)
+for more information. It currently has implementations in the C++ and Go
+libraries.
+
+## Arrow Flight RPC Notes
+
+The precision of a Timestamp (used for timeouts) is now nanoseconds on all
+platforms; previously it was platform-dependent. This may be a breaking change
+depending on your use case.
+([#44679](https://github.com/apache/arrow/issues/44679))
+
+The Python bindings now support various new fields that were added to
+FlightEndpoint/FlightInfo (like `expiration_time`).
+([#36954](https://github.com/apache/arrow/issues/36954))
+
+## C++ Notes
+
+### Compute
+
+- It is now possible to cast from a struct type to another struct type with
+additional columns, provided the additional columns are nullable
+([#44555)](https://github.com/apache/arrow/issues/44555).
+- The compute function `expm1` has been added to compute `exp(x) - 1` with
better
+accuracy when the input value is close to 0
+([#44903](https://github.com/apache/arrow/issues/44903)).
+- Hyperbolic trigonometric functions and their reciprocals have also been
added.
+([#44952](https://github.com/apache/arrow/issues/44952)).
+- The new Decimal32 and Decimal64 types have been further supported by allowing
+casting between numeric, string, and other decimal types
+([#43956](https://github.com/apache/arrow/issues/43956)).
+
+### Acero
+
+- Added AVX2 support for decoding row tables in the Swiss join specialization
of
+hash joins, enabling up to 40% performance improvement for build-heavy
+workloads. ([#43693](https://github.com/apache/arrow/issues/43693))
+
+### Filesystems
+
+- The S3 filesystem has gained support for server-side encryption with customer
+provided keys, aka SSE-C.
+([#43535](https://github.com/apache/arrow/issues/43535))
+- The S3 filesystem also gained an option to disable the SIGPIPE signals that
+may be emitted on some network events.
+([#44695](https://github.com/apache/arrow/issues/44695))
+- The Azure filesystem now supports SAS token authentication.
+([#44308](https://github.com/apache/arrow/issues/44308)).
+
+### Flight RPC
+
+- The precision of a Timestamp (used for timeouts) is now nanoseconds on all
+ platforms; previously it was platform-dependent. This may be a breaking
change
+depending on your use case.
+ ([#44679](https://github.com/apache/arrow/issues/44679))
+- The Python bindings now support various new fields that were added to
+ FlightEndpoint/FlightInfo (like `expiration_time`).
+ ([#36954](https://github.com/apache/arrow/issues/36954))
+- The UCX backend has been deprecated and is scheduled for removal.
+ ([#45079](https://github.com/apache/arrow/issues/45079))
+
+### Parquet
+
+- The initial footer read size can now be configured to reduce the number of
+potential round-trips on hi-latency filesystems such as S3.
+([#45015](https://github.com/apache/arrow/issues/45015))
+- The new `SizeStatistics` format feature has been implemented, though it is
+disabled by default when writing.
+([#40592](https://github.com/apache/arrow/issues/40592))
+- We've added a new method to the ParquetFileReader class,
+[GetReadRanges](https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet17ParquetFileReader13GetReadRangesERKNSt6vectorIiEERKNSt6vectorIiEE7int64_t7int64_t),
+which can calculate the byte ranges necessary to read a given set of columns
and
+row groups. This may be useful to pre-buffer file data via caching mechanisms.
+([#45092](https://github.com/apache/arrow/issues/45092))
+- We've added `arrow::Result`-returning variants for
+`parquet::arrow::OpenFile()` and
+`parquet::arrow::FileReader::GetRecordBatchReader()`.
+([#44784](https://github.com/apache/arrow/issues/44784),
+[#44808](https://github.com/apache/arrow/issues/44808))
+
+## C# Notes
+
+- The `PrimitiveArrayBuilder` constructor has been made public to allow writing
+ custom builders. ([#23995](https://github.com/apache/arrow/issues/23995))
+- Improved the performance of looking up schema fields by name.
+ ([#44575](https://github.com/apache/arrow/issues/44575))
+
+## Java, Go, and Rust Notes
+
+The Java, Go, and Rust Go projects have moved to separate repositories outside
+the main Arrow [monorepo](https://github.com/apache/arrow).
+
+- For notes on the latest release of the [Java
+implementation](https://github.com/apache/arrow-java), see the latest [Arrow
+Java changelog][7].
+- For notes on the latest release of the [Rust
+ implementation](https://github.com/apache/arrow-rs) see the latest [Arrow
Rust
+ changelog][5].
+- For notes on the latest release of the [Go
+implementation](https://github.com/apache/arrow-go), see the latest [Arrow Go
+changelog][6].
+
+## Linux Packaging Notes
+
+- Debian: Fixed keyring format to support newer libapt (e.g., used by
+ Trixie). ([#45118](https://github.com/apache/arrow/issues/45118))
+
+## Python Notes
+
+New features:
+
+- The upcoming pandas 3.0 [string
+ dtype](https://pandas.pydata.org/pdeps/0014-string-dtype.html) is now
+ supported by PyArrow's `to_pandas` routine. In the future, when using pandas
>=3.0,
+ the new pandas behavior will be enabled by default. You can opt into
+ the new behavior under pandas >=2.3 by setting
`pd.options.future.infer_string
+ = True`. This may be considered a breaking change.
+ ([#43683](https://github.com/apache/arrow/issues/43683))
+- Support for 32-bit and 64-bit decimal types was added.
+ ([#44713](https://github.com/apache/arrow/issues/44713))
+- Arrow PyCapsule stream objects are supported in `write_dataset`.
+ ([#43410](https://github.com/apache/arrow/issues/43410))
+- New Flight features have been exposed.
+ ([#36954](https://github.com/apache/arrow/issues/36954))
+- Bindings for `JsonExtensionType` and `JsonArray` were added.
+ ([#44066](https://github.com/apache/arrow/issues/44066))
+- Hyperbolic trigonometry functions added to the Arrow C++ compute kernels are
+ also available in PyArrow.
+ ([#44952](https://github.com/apache/arrow/issues/44952))
+
+Other improvements:
+
+- `strings_to_categorical` keyword in `to_pandas` can now be used for string
+ view type. ([#45175](https://github.com/apache/arrow/issues/45175))
+- `from_buffers` is updated to work with `StringView`.
+ ([#44651](https://github.com/apache/arrow/issues/44651))
+- Version suffixes are also set for Arrow Python C++ (`libarrow_python*`)
+ libraries. ([#44614](https://github.com/apache/arrow/issues/44614))
+
+## Ruby and C GLib Notes
+
+### Ruby
+
+- Added basic support for JRuby with an implementation based on Arrow Java
+ ([#44346](https://github.com/apache/arrow/pull/44346)). The plan is to
release
+ this as a gem once it covers a base set of features. See
+ [#45324](https://github.com/apache/arrow/issues/45324) for more information.
+- Added support for 32bit and 64bit decimal, binary view, and string view. See
+ [issues
+
listed](https://github.com/apache/arrow/issues?q=is%3Aclosed%20milestone%3A19.0.0%20label%3A%22Component%3A%20GLib%22)
+ in the 19.0.0 milestone for more details.
+- Fixed a bug that empty struct list can't be built.
+ ([#44742](https://github.com/apache/arrow/issues/44742))
+- Fixed a bug that `record_batch[:column].size` raises an exception.
+ ([#45119](https://github.com/apache/arrow/issues/45119))
+
+### C GLib
+
+- Added support for 32bit and 64bit decimal, binary view, and string view. See
+ [issues listed in the 19.0.0
+
milestone](https://github.com/apache/arrow/issues?q=is%3Aclosed%20milestone%3A19.0.0%20label%3A%22Component%3A%20GLib%22)
+ for more details.
+
+[1]: https://github.com/apache/arrow/milestone/66?closed=1
+[2]: {{ site.baseurl }}/release/19.0.0.html#contributors
+[3]: {{ site.baseurl }}/release/19.0.0.html#changelog
+[4]: {{ site.baseurl }}/docs/r/news/
+[5]: https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md
+[6]: https://github.com/apache/arrow-go/releases
+[7]: https://github.com/apache/arrow-java/releases