This is an automated email from the ASF dual-hosted git repository.

raulcd pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-site.git


The following commit(s) were added to refs/heads/main by this push:
     new ab4042b720b Website: Add blog post for 17.0.0 (#537)
ab4042b720b is described below

commit ab4042b720b2d80d3b4e8a778ad94c5c1bb57eed
Author: Raúl Cumplido <[email protected]>
AuthorDate: Fri Jul 19 11:07:37 2024 +0200

    Website: Add blog post for 17.0.0 (#537)
    
    Release blog post for 17.0.0.
    
    Release notes are here: https://arrow.apache.org/release/17.0.0.html
    Issues on the milestone are here:
    https://github.com/apache/arrow/milestone/62?closed=1
    
    ---------
    
    Co-authored-by: David Li <[email protected]>
    Co-authored-by: Antoine Pitrou <[email protected]>
    Co-authored-by: Felipe Oliveira Carvalho <[email protected]>
    Co-authored-by: Antoine Pitrou <[email protected]>
    Co-authored-by: Sutou Kouhei <[email protected]>
    Co-authored-by: Sutou Kouhei <[email protected]>
    Co-authored-by: Bryce Mecum <[email protected]>
    Co-authored-by: Adam Reeve <[email protected]>
    Co-authored-by: Joel Lubinitsky <[email protected]>
    Co-authored-by: Dane Pitkin <[email protected]>
    Co-authored-by: Rossi Sun <[email protected]>
---
 _posts/2024-07-16-17.0.0-release.md | 248 ++++++++++++++++++++++++++++++++++++
 1 file changed, 248 insertions(+)

diff --git a/_posts/2024-07-16-17.0.0-release.md 
b/_posts/2024-07-16-17.0.0-release.md
new file mode 100644
index 00000000000..ffc29694b7f
--- /dev/null
+++ b/_posts/2024-07-16-17.0.0-release.md
@@ -0,0 +1,248 @@
+---
+layout: post
+title: "Apache Arrow 17.0.0 Release"
+date: "2024-07-16 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+The Apache Arrow team is pleased to announce the 17.0.0 release. This covers
+over 3 months of development work and includes [**331 resolved issues**][1]
+on [**529 distinct commits**][2] from [**92 distinct contributors**][2].
+See the [Install Page](https://arrow.apache.org/install/)
+to learn how to get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Community
+
+Since the 16.0.0 release, Dane Pitkin has been invited to be committer.
+No new members have joined the Project Management Committee (PMC).
+
+Thanks for your contributions and participation in the project!
+
+## Linux packages notes
+
+- We dropped support for Debian GNU/Linux bullseye
+
+## C Data Interface notes
+
+- `ArrowDeviceArrayStream` can now be imported and exported (GH-40078)
+
+## Arrow Flight RPC notes
+
+- Flight SQL was formally stabilized (GH-39204).
+- Flight SQL added a bulk ingestion command (GH-38255).
+- The JDBC Flight SQL driver now accepts "catalog" as a connection parameter 
(GH-41947).
+- "Stateless" prepared statements are now supported (GH-37220, GH-41262).
+- Java added `FlightStatusCode.RESOURCE_EXHAUSTED` (GH-35888).
+- C++ has some basic support for logging with OpenTelemetry (GH-39898).
+
+## C++ notes
+
+For C++ notes refer to the full changelog.
+
+### Highlights
+
+- Half-float values can now be parsed and formatted correctly (GH-41089).
+- Record batches can now be converted to row-major tensors, not only 
column-major (GH-40866).
+- The CSV writer is now able to write large string arrays that are larger than
+  2 GiB (GH-40270).
+- A possible invalid memory access in `BooleanArray.true_count()` has been 
fixed (GH-41016).
+- A new method `FlattenRecursively` allows recursive nesting of list and
+  fixed-size list arrays (GH-41055).
+- The scratch space in some `Scalar` subclasses is now immutable. This is 
required
+  for proper concurrent access to `Scalar` instances (GH-40069).
+- Calling the `bit_width` or `byte_width` method of an extension type now 
defers
+  to the underlying storage type (GH-41353).
+- Fixed a bug where `MapArray::FromArrays` would behave incorrectly if the 
given
+  offsets array has a non-zero offset (GH-40750).
+- `MapArray::FromArrays` now accepts an optional null bitmap argument
+  (GH-41684).
+- The `ARROW_NO_DEPRECATED_API` macro was unused and has been removed 
(GH-41343).
+
+### Acero
+
+- The left anti join filter no longer crashes when the filter rows are empty 
(GH-41121).
+- A race condition was fixed in the asof join (GH-41149).
+- A potential stack overflow has been fixed (GH-41334, GH-41738).
+- A potential crash on very large data has been fixed (GH-41813).
+- Asof join and sort merge join now support single threaded mode (GH-41190).
+
+### Compute
+
+- List views and maps are now supported by the `if_else`, `case_when` and
+  `coalesce` functions (GH-41418).
+- List views are now supported by the functions `list_slice` (GH-42065),
+  `list_parent_indices` (GH-42235), `take` and `filter` (GH-42116).
+- `list_flatten` can now be recursive based on new optional argument
+  (GH-41183, GH-41055)
+- The `take` and `filter` functions have been made significantly faster on 
fixed-width
+  types, including fixed-size lists of fixed-width types (GH-39798).
+
+### Dataset
+
+- Repeated scanning of an encrypted Parquet dataset now works correctly 
(GH-41431).
+
+### Filesystems
+
+- Standard filesystem implementations are now tracked in a global registry 
which
+  also allows loading third-party filesystem implementations, for example from
+  runtime-loaded DLLs (GH-40342,
+- Directory metadata operations on Azure filesystems are now more aligned with
+  the common expectations for filesystems (GH-41034).
+- `CopyFile` is now supported for Azure filesystems with hierarchical namespace
+  enabled (GH-41095).
+- Azure credentials can now be loaded explicitly from the environment 
(GH-39345),
+  or using the Azure CLI (GH-39344).
+- A potential deadlock was fixed when closing an S3 output stream (GH-41862).
+
+### GPU
+
+- Non-CPU data can now be pretty-printed (GH-41664).
+- Non-CPU data with offsets, such as list and binary data, can now be properly
+  sent over IPC (GH-42198).
+
+### IPC
+
+- Flatbuffers serialization is now more deterministic (GH-40361).
+
+### Parquet
+
+- A crash was fixed when reading an invalid Parquet file where columns claim to
+  be of different lengths (GH-41317).
+- Definition and repetition levels are now more strictly checked, avoiding 
later
+  crashes when reading an invalid Parquet file (GH-41321).
+- A crash was fixed when reading an invalid encrypted Parquet file (GH-43070).
+- Fixed a bug where `DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize` 
could
+  return an invalid estimate in some situations (GH-41545).
+- Delimiting records is now faster for columns with nested repeating 
(GH-41361).
+
+### Substrait
+
+- Support for more Arrow data types was added: some temporal types, half 
floats,
+  large string and large binary (GH-40695).
+
+## C# notes
+
+- The performance of building Decimal arrays using SqlDecimal values was 
improved for .NET 7+ (GH-41349)
+- Scalar arrays now implement `ICollection<T?>` (GH-38692)
+- Concatenating arrays with a non-zero offset with ArrowArrayConcatenator was 
fixed (GH-41164)
+- Concatenating union arrays with ArrowArrayConcatenator was fixed (GH-41198)
+- Accessing values of decimal arrays with a non-zero offset was fixed 
(GH-41199)
+
+## Go Notes
+
+### Bug Fixes
+
+#### Arrow
+
+- Prevent exposure of invalid Go pointers in CGO code 
([GH-43062](https://github.com/apache/arrow/issues/43062))
+- Fix memory leak for 0-length C array imports 
([GH=41534](https://github.com/apache/arrow/issues/41534))
+- Ensure statement handle is updated so stateless prepared statements work 
properly ([GH-41427](https://github.com/apache/arrow/issues/41427))
+
+#### Parquet
+
+- Fix memory leak in BufferedPageWriter 
([GH-41697](https://github.com/apache/arrow/issues/41697))
+- Fix performance regression in PooledBufferWriter 
([GH-41541](https://github.com/apache/arrow/issues/41541))
+
+### Enhancements
+
+#### Arrow
+
+- Arrow Schemas and Records can now be created from Protobuf messages 
([GH-40494](https://github.com/apache/arrow/issues/40494))
+
+#### Parquet
+
+- Performance improvement for BitWriter VlqInt 
([GH-41160](https://github.com/apache/arrow/pull/41160))
+
+## Java notes
+
+**Some changes are coming up in the next version, Arrow 18. Java 8 support 
will be removed. The version of the flight-core artifact with shaded gRPC will 
no longer be distributed.**
+
+- Basic support for ListView (GH-41287) and StringView (GH-40339) has been 
added. These types should still be considered experimental. 
+
+## JavaScript notes
+
+- General maintenance. Clean up packaging 
([GH-39722](https://github.com/apache/arrow/issues/39722)), update dependencies 
([GH-41905](https://github.com/apache/arrow/issues/41905)).
+
+## Python notes
+
+### Compatibility notes:
+* To ensure Python 3.13 compatibility, _Py_IsFinalizing has been replaced with 
a public API (GH-41475).
+* The C Data Interface now supports CUDA devices (GH-40384).
+
+### New features:
+* Added support for Emscripten via Pyodide (GH-41910).
+
+### Other improvements:
+* The ParquetWriter added the store_decimal_as_integer option (GH-42168).
+* The Float16 logical type is supported in Parquet (GH-42016).
+* Exposed bit_width and byte_width to extension types (GH-41389).
+* Added bindings for Device and MemoryManager classes (GH-41126).
+* The PyCapsule interface now exposes the device interface (GH-38325).
+* Various PyArrow APIs have been updated to work with non-CPU architectures 
gracefully. (GH-42112, GH-41664, GH-41662, 
+
+### Relevant bug fixes:
+* Fixed Numpy 2.0 compatibility issues (GH-42170, GH-41924).
+* Fixed sporadic as_of join failures (GH-41149).
+* Fixed a bug in RecordBatch.filter() when passing a ChunkedArray, which would 
cause a segfault (GH-38770).
+* Fixed a bug in RecordBatch.from_arrays() when passing a storage array, which 
would cause a segfault (GH-37669).
+* Fixed a bug where constructing a MapArray from Array could drop nulls 
(GH-41684).
+* FIxed a bug where RunEndEncodedArray.from_arrays fails if run_ends are 
pyarrow.Array (GH-40560).
+* FIxed a regression introduced in PyArrow v16 in RecordBatchReader.cast() 
(GH-41884).
+
+## R notes
+
+* R functions that users write that use functions that Arrow supports in 
dataset queries now can be used in queries too. Previously, only functions that 
used arithmetic operators worked. For example, `time_hours <- function(mins) 
mins / 60` worked, but `time_hours_rounded <- function(mins) round(mins / 60)` 
did not; now both work. These are automatic translations rather than true 
user-defined functions (UDFs); for UDFs, see `register_scalar_function()`. 
[GH-41223](https://github.com/apac [...]
+* `mutate()` expressions can now include aggregations, such as `x - mean(x)`. 
[GH-41350](https://github.com/apache/arrow/pull/41350)
+* `summarize()` supports more complex expressions, and correctly handles cases 
where column names are reused in expressions. 
[GH-41323](https://github.com/apache/arrow/issues/41323)
+* The `na_matches` argument to the `dplyr::*_join()` functions is now 
supported. This argument controls whether `NA` values are considered equal when 
joining. [GH-41223](https://github.com/apache/arrow/issues/41358)
+* R metadata, stored in the Arrow schema to support round-tripping data 
between R and Arrow/Parquet, is now serialized and deserialized more strictly. 
This makes it safer to load data from files from unknown sources into R 
data.frames. [GH-41223](https://github.com/apache/arrow/issues/41969)
+
+For more on what’s in the 17.0.0 R package, see the [R changelog][4].
+
+## Ruby and C GLib notes
+
+### Ruby
+
+- Improved `Arrow::Table#to_s` format
+  - This is a breaking change
+
+### C GLib
+
+- Added support for Microsoft Visual C++
+- Added `gadataset_dataset_to_record_batch_reader()`
+
+## Rust notes
+
+The Rust projects have moved to separate repositories outside the
+main Arrow monorepo. For notes on the latest release of the Rust
+implementation, see the latest [Arrow Rust changelog][5].
+
+[1]: https://github.com/apache/arrow/milestone/62?closed=1
+[2]: {{ site.baseurl }}/release/17.0.0.html#contributors
+[3]: {{ site.baseurl }}/release/17.0.0.html#changelog
+[4]: {{ site.baseurl }}/docs/r/news/
+[5]: https://github.com/apache/arrow-rs/tags

Reply via email to