This is an automated email from the ASF dual-hosted git repository.
raulcd pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-site.git
The following commit(s) were added to refs/heads/main by this push:
new ab4042b720b Website: Add blog post for 17.0.0 (#537)
ab4042b720b is described below
commit ab4042b720b2d80d3b4e8a778ad94c5c1bb57eed
Author: Raúl Cumplido <[email protected]>
AuthorDate: Fri Jul 19 11:07:37 2024 +0200
Website: Add blog post for 17.0.0 (#537)
Release blog post for 17.0.0.
Release notes are here: https://arrow.apache.org/release/17.0.0.html
Issues on the milestone are here:
https://github.com/apache/arrow/milestone/62?closed=1
---------
Co-authored-by: David Li <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Felipe Oliveira Carvalho <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Bryce Mecum <[email protected]>
Co-authored-by: Adam Reeve <[email protected]>
Co-authored-by: Joel Lubinitsky <[email protected]>
Co-authored-by: Dane Pitkin <[email protected]>
Co-authored-by: Rossi Sun <[email protected]>
---
_posts/2024-07-16-17.0.0-release.md | 248 ++++++++++++++++++++++++++++++++++++
1 file changed, 248 insertions(+)
diff --git a/_posts/2024-07-16-17.0.0-release.md
b/_posts/2024-07-16-17.0.0-release.md
new file mode 100644
index 00000000000..ffc29694b7f
--- /dev/null
+++ b/_posts/2024-07-16-17.0.0-release.md
@@ -0,0 +1,248 @@
+---
+layout: post
+title: "Apache Arrow 17.0.0 Release"
+date: "2024-07-16 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+The Apache Arrow team is pleased to announce the 17.0.0 release. This covers
+over 3 months of development work and includes [**331 resolved issues**][1]
+on [**529 distinct commits**][2] from [**92 distinct contributors**][2].
+See the [Install Page](https://arrow.apache.org/install/)
+to learn how to get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Community
+
+Since the 16.0.0 release, Dane Pitkin has been invited to be committer.
+No new members have joined the Project Management Committee (PMC).
+
+Thanks for your contributions and participation in the project!
+
+## Linux packages notes
+
+- We dropped support for Debian GNU/Linux bullseye
+
+## C Data Interface notes
+
+- `ArrowDeviceArrayStream` can now be imported and exported (GH-40078)
+
+## Arrow Flight RPC notes
+
+- Flight SQL was formally stabilized (GH-39204).
+- Flight SQL added a bulk ingestion command (GH-38255).
+- The JDBC Flight SQL driver now accepts "catalog" as a connection parameter
(GH-41947).
+- "Stateless" prepared statements are now supported (GH-37220, GH-41262).
+- Java added `FlightStatusCode.RESOURCE_EXHAUSTED` (GH-35888).
+- C++ has some basic support for logging with OpenTelemetry (GH-39898).
+
+## C++ notes
+
+For C++ notes refer to the full changelog.
+
+### Highlights
+
+- Half-float values can now be parsed and formatted correctly (GH-41089).
+- Record batches can now be converted to row-major tensors, not only
column-major (GH-40866).
+- The CSV writer is now able to write large string arrays that are larger than
+ 2 GiB (GH-40270).
+- A possible invalid memory access in `BooleanArray.true_count()` has been
fixed (GH-41016).
+- A new method `FlattenRecursively` allows recursive nesting of list and
+ fixed-size list arrays (GH-41055).
+- The scratch space in some `Scalar` subclasses is now immutable. This is
required
+ for proper concurrent access to `Scalar` instances (GH-40069).
+- Calling the `bit_width` or `byte_width` method of an extension type now
defers
+ to the underlying storage type (GH-41353).
+- Fixed a bug where `MapArray::FromArrays` would behave incorrectly if the
given
+ offsets array has a non-zero offset (GH-40750).
+- `MapArray::FromArrays` now accepts an optional null bitmap argument
+ (GH-41684).
+- The `ARROW_NO_DEPRECATED_API` macro was unused and has been removed
(GH-41343).
+
+### Acero
+
+- The left anti join filter no longer crashes when the filter rows are empty
(GH-41121).
+- A race condition was fixed in the asof join (GH-41149).
+- A potential stack overflow has been fixed (GH-41334, GH-41738).
+- A potential crash on very large data has been fixed (GH-41813).
+- Asof join and sort merge join now support single threaded mode (GH-41190).
+
+### Compute
+
+- List views and maps are now supported by the `if_else`, `case_when` and
+ `coalesce` functions (GH-41418).
+- List views are now supported by the functions `list_slice` (GH-42065),
+ `list_parent_indices` (GH-42235), `take` and `filter` (GH-42116).
+- `list_flatten` can now be recursive based on new optional argument
+ (GH-41183, GH-41055)
+- The `take` and `filter` functions have been made significantly faster on
fixed-width
+ types, including fixed-size lists of fixed-width types (GH-39798).
+
+### Dataset
+
+- Repeated scanning of an encrypted Parquet dataset now works correctly
(GH-41431).
+
+### Filesystems
+
+- Standard filesystem implementations are now tracked in a global registry
which
+ also allows loading third-party filesystem implementations, for example from
+ runtime-loaded DLLs (GH-40342,
+- Directory metadata operations on Azure filesystems are now more aligned with
+ the common expectations for filesystems (GH-41034).
+- `CopyFile` is now supported for Azure filesystems with hierarchical namespace
+ enabled (GH-41095).
+- Azure credentials can now be loaded explicitly from the environment
(GH-39345),
+ or using the Azure CLI (GH-39344).
+- A potential deadlock was fixed when closing an S3 output stream (GH-41862).
+
+### GPU
+
+- Non-CPU data can now be pretty-printed (GH-41664).
+- Non-CPU data with offsets, such as list and binary data, can now be properly
+ sent over IPC (GH-42198).
+
+### IPC
+
+- Flatbuffers serialization is now more deterministic (GH-40361).
+
+### Parquet
+
+- A crash was fixed when reading an invalid Parquet file where columns claim to
+ be of different lengths (GH-41317).
+- Definition and repetition levels are now more strictly checked, avoiding
later
+ crashes when reading an invalid Parquet file (GH-41321).
+- A crash was fixed when reading an invalid encrypted Parquet file (GH-43070).
+- Fixed a bug where `DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize`
could
+ return an invalid estimate in some situations (GH-41545).
+- Delimiting records is now faster for columns with nested repeating
(GH-41361).
+
+### Substrait
+
+- Support for more Arrow data types was added: some temporal types, half
floats,
+ large string and large binary (GH-40695).
+
+## C# notes
+
+- The performance of building Decimal arrays using SqlDecimal values was
improved for .NET 7+ (GH-41349)
+- Scalar arrays now implement `ICollection<T?>` (GH-38692)
+- Concatenating arrays with a non-zero offset with ArrowArrayConcatenator was
fixed (GH-41164)
+- Concatenating union arrays with ArrowArrayConcatenator was fixed (GH-41198)
+- Accessing values of decimal arrays with a non-zero offset was fixed
(GH-41199)
+
+## Go Notes
+
+### Bug Fixes
+
+#### Arrow
+
+- Prevent exposure of invalid Go pointers in CGO code
([GH-43062](https://github.com/apache/arrow/issues/43062))
+- Fix memory leak for 0-length C array imports
([GH=41534](https://github.com/apache/arrow/issues/41534))
+- Ensure statement handle is updated so stateless prepared statements work
properly ([GH-41427](https://github.com/apache/arrow/issues/41427))
+
+#### Parquet
+
+- Fix memory leak in BufferedPageWriter
([GH-41697](https://github.com/apache/arrow/issues/41697))
+- Fix performance regression in PooledBufferWriter
([GH-41541](https://github.com/apache/arrow/issues/41541))
+
+### Enhancements
+
+#### Arrow
+
+- Arrow Schemas and Records can now be created from Protobuf messages
([GH-40494](https://github.com/apache/arrow/issues/40494))
+
+#### Parquet
+
+- Performance improvement for BitWriter VlqInt
([GH-41160](https://github.com/apache/arrow/pull/41160))
+
+## Java notes
+
+**Some changes are coming up in the next version, Arrow 18. Java 8 support
will be removed. The version of the flight-core artifact with shaded gRPC will
no longer be distributed.**
+
+- Basic support for ListView (GH-41287) and StringView (GH-40339) has been
added. These types should still be considered experimental.
+
+## JavaScript notes
+
+- General maintenance. Clean up packaging
([GH-39722](https://github.com/apache/arrow/issues/39722)), update dependencies
([GH-41905](https://github.com/apache/arrow/issues/41905)).
+
+## Python notes
+
+### Compatibility notes:
+* To ensure Python 3.13 compatibility, _Py_IsFinalizing has been replaced with
a public API (GH-41475).
+* The C Data Interface now supports CUDA devices (GH-40384).
+
+### New features:
+* Added support for Emscripten via Pyodide (GH-41910).
+
+### Other improvements:
+* The ParquetWriter added the store_decimal_as_integer option (GH-42168).
+* The Float16 logical type is supported in Parquet (GH-42016).
+* Exposed bit_width and byte_width to extension types (GH-41389).
+* Added bindings for Device and MemoryManager classes (GH-41126).
+* The PyCapsule interface now exposes the device interface (GH-38325).
+* Various PyArrow APIs have been updated to work with non-CPU architectures
gracefully. (GH-42112, GH-41664, GH-41662,
+
+### Relevant bug fixes:
+* Fixed Numpy 2.0 compatibility issues (GH-42170, GH-41924).
+* Fixed sporadic as_of join failures (GH-41149).
+* Fixed a bug in RecordBatch.filter() when passing a ChunkedArray, which would
cause a segfault (GH-38770).
+* Fixed a bug in RecordBatch.from_arrays() when passing a storage array, which
would cause a segfault (GH-37669).
+* Fixed a bug where constructing a MapArray from Array could drop nulls
(GH-41684).
+* FIxed a bug where RunEndEncodedArray.from_arrays fails if run_ends are
pyarrow.Array (GH-40560).
+* FIxed a regression introduced in PyArrow v16 in RecordBatchReader.cast()
(GH-41884).
+
+## R notes
+
+* R functions that users write that use functions that Arrow supports in
dataset queries now can be used in queries too. Previously, only functions that
used arithmetic operators worked. For example, `time_hours <- function(mins)
mins / 60` worked, but `time_hours_rounded <- function(mins) round(mins / 60)`
did not; now both work. These are automatic translations rather than true
user-defined functions (UDFs); for UDFs, see `register_scalar_function()`.
[GH-41223](https://github.com/apac [...]
+* `mutate()` expressions can now include aggregations, such as `x - mean(x)`.
[GH-41350](https://github.com/apache/arrow/pull/41350)
+* `summarize()` supports more complex expressions, and correctly handles cases
where column names are reused in expressions.
[GH-41323](https://github.com/apache/arrow/issues/41323)
+* The `na_matches` argument to the `dplyr::*_join()` functions is now
supported. This argument controls whether `NA` values are considered equal when
joining. [GH-41223](https://github.com/apache/arrow/issues/41358)
+* R metadata, stored in the Arrow schema to support round-tripping data
between R and Arrow/Parquet, is now serialized and deserialized more strictly.
This makes it safer to load data from files from unknown sources into R
data.frames. [GH-41223](https://github.com/apache/arrow/issues/41969)
+
+For more on what’s in the 17.0.0 R package, see the [R changelog][4].
+
+## Ruby and C GLib notes
+
+### Ruby
+
+- Improved `Arrow::Table#to_s` format
+ - This is a breaking change
+
+### C GLib
+
+- Added support for Microsoft Visual C++
+- Added `gadataset_dataset_to_record_batch_reader()`
+
+## Rust notes
+
+The Rust projects have moved to separate repositories outside the
+main Arrow monorepo. For notes on the latest release of the Rust
+implementation, see the latest [Arrow Rust changelog][5].
+
+[1]: https://github.com/apache/arrow/milestone/62?closed=1
+[2]: {{ site.baseurl }}/release/17.0.0.html#contributors
+[3]: {{ site.baseurl }}/release/17.0.0.html#changelog
+[4]: {{ site.baseurl }}/docs/r/news/
+[5]: https://github.com/apache/arrow-rs/tags