This is an automated email from the ASF dual-hosted git repository.

raulcd pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-site.git


The following commit(s) were added to refs/heads/main by this push:
     new 28c3cd4ffc7 Website: Add blog post for 16.0.0 (#500)
28c3cd4ffc7 is described below

commit 28c3cd4ffc7552c973e59eebd365962db72fe77a
Author: Raúl Cumplido <[email protected]>
AuthorDate: Mon Apr 29 15:31:34 2024 +0200

    Website: Add blog post for 16.0.0 (#500)
    
    Co-authored-by: David Li <[email protected]>
    Co-authored-by: Curt Hagenlocher <[email protected]>
    Co-authored-by: Matt Topol <[email protected]>
    Co-authored-by: Alenka Frim <[email protected]>
    Co-authored-by: Dane Pitkin <[email protected]>
    Co-authored-by: Bryce Mecum <[email protected]>
    Co-authored-by: Sutou Kouhei <[email protected]>
---
 _posts/2024-04-20-16.0.0-release.md | 299 ++++++++++++++++++++++++++++++++++++
 1 file changed, 299 insertions(+)

diff --git a/_posts/2024-04-20-16.0.0-release.md 
b/_posts/2024-04-20-16.0.0-release.md
new file mode 100644
index 00000000000..44fe02c6290
--- /dev/null
+++ b/_posts/2024-04-20-16.0.0-release.md
@@ -0,0 +1,299 @@
+---
+layout: post
+title: "Apache Arrow 16.0.0 Release"
+date: "2024-04-20 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+The Apache Arrow team is pleased to announce the 16.0.0 release. This covers
+over 3 months of development work and includes [**385 resolved issues**][1]
+on [**586 distinct commits**][2] from [**119 distinct contributors**][2].
+See the [Install Page](https://arrow.apache.org/install/)
+to learn how to get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Community
+
+Since the 15.0.0 release, Jeffrey Vo, Jay Zhan, Bryce Mecum, Joel Lubinitsky,
+and Sarah Gilmore have been invited to be committers.
+No new members have joined the Project Management Committee (PMC).
+
+Thanks for your contributions and participation in the project!
+
+## C Data Interface notes
+
+- Added `RegisterDeviceMemoryManager` and `GetDeviceMemoryManage` for managing 
mappings between a device type and id to a memory manager 
([GH-40698](https://github.com/apache/arrow/issues/40698)).
+- Added `RegisterCUDADevice` to register CUDA devices 
([GH-40698](https://github.com/apache/arrow/issues/40698)).
+- Added `ImportFromChunkedArray` and `ExportChunkedArray` for handling Chunked 
Arrays in the C Stream Interface 
([GH-38717](https://github.com/apache/arrow/issues/38717)).
+- Fixed an issue where string and nested types weren’t being correctly 
imported with DeviceArray 
([GH-39769](https://github.com/apache/arrow/issues/39769)).
+- Added support for copying Arrays and RecordBatches between memory types 
([GH-39771](https://github.com/apache/arrow/issues/39771)).
+
+## Arrow Flight RPC notes
+
+- Session variable RPCs were added 
([GH-34865](https://github.com/apache/arrow/issues/34865))
+- Go: cookies can be copied to another connection to reuse existing 
credentials ([GH-39837](https://github.com/apache/arrow/issues/39837))
+- Go: enable PollFlightInfo for Flight SQL clients/servers 
([GH-39574](https://github.com/apache/arrow/issues/39574))
+- Java: the JDBC driver now tries all locations the server sends it 
([GH-38573](https://github.com/apache/arrow/issues/38573))
+- Java: tweak some options to give better performance 
([GH-40475](https://github.com/apache/arrow/issues/40745), 
[GH-40039](https://github.com/apache/arrow/issues/40039))
+
+## C++ notes
+For C++ notes refer to the full changelog.
+
+## Highlights
+
+- Initial support for the Azure Blob Storage has been added 
([GH-18014](https://github.com/apache/arrow/issues/18014)).
+- Arrow C++ can now be built with Emscripten 
([GH-37821](https://github.com/apache/arrow/pull/37821)) which lays the 
foundation for running Arrow C++ under WASM runtimes and eventually 
[PyArrow](https://github.com/apache/arrow/pull/37822) as well.
+- Arrow's filesystem modules have been separated out into individual libraries 
and this change enables writing and registering custom filesystem 
implementations ([GH-38309](https://github.com/apache/arrow/issues/38309)).
+- Conversion from `Table` and `RecordBatch` to a `Tensor` (not the same as
+[tensor extension 
array](https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#official-list))
+is being developed. Umbrella issue is created 
([GH-40058](https://github.com/apache/arrow/issues/40058))
+and issues connected to the `RecordBatch` conversion are included in this 
release
+([GH-40059](https://github.com/apache/arrow/issues/40059),
+[GH-40357](https://github.com/apache/arrow/issues/40357),
+[GH-40297](https://github.com/apache/arrow/issues/40297),
+[GH-40060](https://github.com/apache/arrow/issues/40060),
+[GH-40061](https://github.com/apache/arrow/issues/40061) and
+[GH-40866](https://github.com/apache/arrow/issues/40866)) which means 
`RecordBatch` can now be
+converted to a column or row-major two-dimensional structure.
+
+## Breaking Changes
+
+- `Function::is_impure` has been renamed to `is_pure` 
([GH-40607](https://github.com/apache/arrow/issues/40607)).
+
+## Compute
+
+### Bug Fixes
+
+- Fixed a potential crash when accessing the `true_count` property on a 
BooleanArray ([GH-41016](https://github.com/apache/arrow/issues/41016)).
+
+### Performance improvements
+
+- Significantly improved performance of the take kernel on certain types of 
inputs ([GH-40207](https://github.com/apache/arrow/issues/40207)).
+
+### Enhancements
+
+- Support for casting to and from half-float (float16) has been added 
([GH-20213](https://github.com/apache/arrow/issues/20213)).
+- Added support for residual predicates to Swiss Join implementation 
([GH-20339](https://github.com/apache/arrow/issues/20339)).
+- Expanded support to primitive filter implementation for all fixed-width 
primitive types and take filter implementation for all well-known fixed-width 
types ([GH-39740](https://github.com/apache/arrow/issues/39740)).
+- Added support for calling the `binary_slice` kernel on Fixed-Size Binary 
Arrays ([GH-39231](https://github.com/apache/arrow/issues/39231)).
+- The cast kernel now supports casting from LargeString, Binary, and 
LargeBinary to Dictionary 
([GH-39463](https://github.com/apache/arrow/issues/39463)).
+- Fields of different decimal precision can now be used together in arithmetic 
operations without an explicit cast beforehand. 
([GH-40126](https://github.com/apache/arrow/issues/40126)).
+
+## Datasets
+
+- Improved backpressure handling in the Dataset Writer which can significantly 
reduce memory usage for some use cases 
([https://github.com/apache/arrow/pull/40722](https://github.com/apache/arrow/pull/40722)).
+
+## Parquet
+
+- Byte stream split encoding support has been added for FIXED_LEN_BYTE_ARRAY, 
INT32, and INT64 which enables this encoding for half-float (float16) and 
fixed-width decimal ([GH-39978](https://github.com/apache/arrow/issues/39978)).
+- Decoding boolean values has been made faster for a variety of cases 
([GH-40872](https://github.com/apache/arrow/issues/40872)).
+
+## Filesystems
+
+### New Features
+
+- In addition to building the individual filesystem implementations as 
separate modules, users can now write and register custom filesystem 
implementations ([GH-38309](https://github.com/apache/arrow/issues/38309)).
+- A new environment variable, `AWS_ENDPOINT_URL_S3`, has been added which 
allows separately overriding the endpoint for S3 operations alone 
([GH-38663](https://github.com/apache/arrow/issues/38663)).
+
+### Bug Fixes
+
+- Fixed a bug in the S3 filesystem implementation that could cause a crash 
when deleting an object having duplicate forward slashes in its name 
([GH-38821](https://github.com/apache/arrow/issues/38821)).
+- Fixed a bug where `hash_mean` could silently overflow 
([GH-38833](https://github.com/apache/arrow/issues/38833)).
+
+### Improvements
+
+- The S3 implementation now sets the content-type of directory-like objects to 
application/x-directory to improve compatibility with other S3 tools 
([GH-38794](https://github.com/apache/arrow/issues/38794)).
+- Repeated S3Client initialization is now roughly an order of magnitude faster 
([GH-40299](https://github.com/apache/arrow/pull/40299)).
+- The MemoryPoolStats implementation has been reworked to re-order loads and 
stores which may be an improvement for some allocation-heavy, multi-threaded 
applications ([GH-40783](https://github.com/apache/arrow/issues/40783)).
+
+### Substrait
+
+- Support has been added to Substrait for a variety of Arrow types 
([GH-40695](https://github.com/apache/arrow/issues/40695)).
+- substrait-cpp has been upgraded to 0.44 
([GH-40695](https://github.com/apache/arrow/issues/40695)).
+
+## Development
+
+- Added support the mold and lld linkers for building Arrow C++ 
([GH-40394](https://github.com/apache/arrow/issues/40394), 
[GH-40400](https://github.com/apache/arrow/issues/40400)).
+
+### Miscellaneous
+
+- Upgraded ORC to 2.0.0 
([GH-40507](https://github.com/apache/arrow/issues/40507)).
+- Upgraded zstd to 1.5.6 
([GH-40837](https://github.com/apache/arrow/pull/40837)).
+- Upgraded google benchmark to 1.8.3 
([GH-39863](https://github.com/apache/arrow/issues/39863)).
+- Upgraded zlib 1.3.1 
([GH-39876](https://github.com/apache/arrow/issues/39876)).
+- Various ToString methods now support an optional `show_metadata` argument 
which will print metadata that may exist in nested types. 
([GH-39864](https://github.com/apache/arrow/issues/39864)).
+
+## C# notes
+- IPC record batch compression has been implemented 
[GH-24834](https://github.com/apache/arrow/issues/24834)
+- Optional materialization of C# string arrays is now supported 
[GH-41047](https://github.com/apache/arrow/issues/41047)
+- A memory leak in the C Data interface has been fixed 
[GH-40898](https://github.com/apache/arrow/issues/40898)
+- Various other bug fixes and improvements.
+
+
+## Go Notes
+
+* The Golang Arrow and Parquet libraries now require Go 1.21+ 
([GH-40733](https://github.com/apache/arrow/issues/40733)) 
+
+### Bug Fixes
+
+#### Arrow
+
+* FlightSQL Driver will now properly handle concurrent result sets instead of 
pulling the entire result into memory 
([GH-40089](https://github.com/apache/arrow/issues/40089))
+* FlightSQL driver will now correctly respect the `DriverConfig.TLSEnabled` 
field ([GH-40097](https://github.com/apache/arrow/issues/40097))
+* Fixed a panic on 32-bit architectures 
([GH-40672](https://github.com/apache/arrow/issues/40672))
+* Corrected a precision loss for Decimal types when converting to JSON 
([GH-40693](https://github.com/apache/arrow/issues/40693))
+* Fixed an issue with `array.RecordBuilder` when using a NullType column 
([GH-40719](https://github.com/apache/arrow/issues/40719))
+
+#### Parquet
+
+* Fixed panic when writing a DeltaBinaryPacked column containing only nulls 
([GH-35718](https://github.com/apache/arrow/issues/35718))
+* Fixed a panic when writing a ListOf(DeltaBinaryPacked) field with no data 
([GH-39309](https://github.com/apache/arrow/issues/39309))
+* Arrow DATE64 types will now be properly coerced into Parquet DATE[32-bit] 
logical type ([GH-39456](https://github.com/apache/arrow/issues/39456))
+* Fixed the timezone semantics for timestamp conversion from Arrow to Parquet 
([GH-39466](https://github.com/apache/arrow/issues/39466))
+* Corrected an inaccuracy with `RowGroupTotalCompressedBytes` and 
`RowGroupTotalBytesWritten` for Parquet file writer 
([GH-39870](https://github.com/apache/arrow/issues/39870))
+* Fixed the `TotalCompressedBytes` count when falling back to plain encoding 
if a dictionary is too large 
([GH-39921](https://github.com/apache/arrow/issues/39921))
+* Fixed a bug when reslicing a nullable dictionary in the chunked writer 
([GH-39925](https://github.com/apache/arrow/issues/39925))
+
+### Enhancements
+
+#### Arrow
+
+* Users can now access the underlying `MemoTable` of a dictionary builder 
([GH-38988](https://github.com/apache/arrow/issues/38988))
+* Added an option to provide a string replacer for CSV writing 
([GH-39552](https://github.com/apache/arrow/issues/39552))
+* Flight: Cookies can be copied to another connection to reuse existing 
credentials ([GH-39837](https://github.com/apache/arrow/issues/39837))
+* Flight: enable PollFlightInfo for Flight SQL clients/servers 
([GH-39574](https://github.com/apache/arrow/issues/39574))
+* Added the ability to create a PreparedStatement from persisted data and 
provided access for FlightSQL users to the PreparedStatement handle property 
([GH-39774](https://github.com/apache/arrow/issues/39774) 
[GH-39910](https://github.com/apache/arrow/issues/39910))
+* FlightRPC Session management extensions have been implemented 
([GH-40155](https://github.com/apache/arrow/issues/40155))
+
+#### Parquet
+
+* Can now register new compression codecs for Parquet 
([GH-40113](https://github.com/apache/arrow/issues/40113))
+* Parquet footers can be incrementally written without closing the file 
([GH-40630](https://github.com/apache/arrow/issues/40630))
+
+## Java notes
+- A breaking change to support Java 9 modules has been implemented in this 
release. [GH-39001](https://github.com/apache/arrow/issues/39001)
+- A new Float16 type has been added. 
[GH-39680](https://github.com/apache/arrow/issues/39680)
+- Java 22 is supported. 
[GH-40680](https://github.com/apache/arrow/issues/40680)
+- Various bug fixes and improvements.
+
+
+## JavaScript notes
+
+* Dates are now stored as TimestampMillisecond
+  ([GH-40892](https://github.com/apache/arrow/pull/40892))
+* Vectors created from typed arrays are now correctly not nullable and null
+  counts are now correct
+  ([GH-40852](https://github.com/apache/arrow/pull/40852))
+
+## Python notes
+
+Compatibility notes:
+* To ensure PyArrow compatibility with NumPy 2.0 umbrella issue has been 
closed [GH-39532](https://github.com/apache/arrow/issues/39532) with last 
issues included in 16.0.0 Arrow release 
([GH-41098](https://github.com/apache/arrow/issues/41098), 
[GH-39848](https://github.com/apache/arrow/issues/39848) and 
[GH-40376](https://github.com/apache/arrow/issues/40376)).
+* We no longer use internals to create Block objects and started using new 
pandas API with pandas version 3 
[GH-35081](https://github.com/apache/arrow/issues/35081)
+* Pandas compatibility code has been simplified as old pandas and Python 
versions are not supported anymore 
[GH-40720](https://github.com/apache/arrow/issues/40720)
+* Deprecated `pyarrow.filesystem` legacy implementations have been removed 
[GH-20127](https://github.com/apache/arrow/issues/20127)
+
+New features:
+* Converting Arrow `Table` and `RecordBatch` to a `Tensor` (not the same as 
[tensor extension 
array](https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#official-list))
 is being developed in Arrow C++ with bindings in Python. Umbrella issue: 
([GH-40058](https://github.com/apache/arrow/issues/40058)). In current release 
the option to convert a `RecordBatch` to `Tensor` with 
`pyarrow.RecordBatch.to_tensor(...)` is added returning a row or column major 
tensor with an option of [...]
+* `ListView` and `LargeListView` array formats are now supported by PyArrow 
([GH-39812](https://github.com/apache/arrow/issues/39812), 
[GH-39855](https://github.com/apache/arrow/issues/39855), 
[GH-40205](https://github.com/apache/arrow/issues/40205), 
[GH-41039](https://github.com/apache/arrow/issues/41039), 
[GH-40266](https://github.com/apache/arrow/issues/40266))
+* `Binary` and `StringView` are now supported in PyArrow 
([GH-39651](https://github.com/apache/arrow/issues/39651), 
[GH-39852](https://github.com/apache/arrow/issues/39852), 
[GH-40092](https://github.com/apache/arrow/issues/40092))
+* Final support for Run-End Encoded arrays in PyArrow has been included 
(conversion to numpy and pandas 
[GH-40659](https://github.com/apache/arrow/issues/40659), construction in 
`pa.array(...)` [GH-40273](https://github.com/apache/arrow/issues/40273))
+* `AsofJoinNode` C++ functionality is now exposed in Python as a `join_asof` 
[GH-34235](https://github.com/apache/arrow/issues/34235)
+* Minimal python bindings are added for AzureFilesystem 
[GH-39968](https://github.com/apache/arrow/issues/39968)
+* `FixedSizeTensorScalar` class is added 
[GH-37484](https://github.com/apache/arrow/issues/37484)
+
+Other improvements:
+* Add ChunkedArray import/export to/from C 
[GH-39984](https://github.com/apache/arrow/issues/39984)
+* `pyarrow.Field` and `pyarrow.ChunkedArray` can now be constructed from 
objects supporting the PyCapsule Arrow C Data Interface 
[GH-38010](https://github.com/apache/arrow/issues/38010)
+* Requested_schema is supported in `__arrow_c_stream__` implementations 
[GH-40066](https://github.com/apache/arrow/issues/40066)
+* Add low-level bindings for exporting/importing the C Device Interface
+ [GH-39979](https://github.com/apache/arrow/issues/39979)
+* Function to download and extract timezone database on a Windows machine is 
added [GH-37328](https://github.com/apache/arrow/issues/37328)
+* Missing methods are added to `pyarrow.RecordBatch` 
[GH-30915](https://github.com/apache/arrow/issues/30915)
+* Dictionary is now also accepted in `pyarrow.record_batch` factory function 
(as in `pyarrow.table`) [GH-40291](https://github.com/apache/arrow/issues/40291)
+* Usage of scalar legacy cast has been removed 
[GH-40023](https://github.com/apache/arrow/issues/40023)
+* Missing byte_width attribute are added to all DataType classes 
[GH-39277](https://github.com/apache/arrow/issues/39277)
+* `FileInfo` instances can now be used to construct Dataset objects 
[GH-40142](https://github.com/apache/arrow/issues/40142)
+* Support hashing for `FileMetaData` and `ParquetSchema` 
[GH-39780](https://github.com/apache/arrow/issues/39780)
+* `force_virtual_addressing` is exposed in PyArrow 
[GH-39779](https://github.com/apache/arrow/issues/39779)
+
+Relevant bug fixes:
+* Calling `pyarrow.dataset.ParquetFileFormat.make_write_options` as a class 
method now returns a warning 
[GH-39440](https://github.com/apache/arrow/issues/39440)
+* `ScalarMemoTable`is now initiated only when deduplication is enabled which 
fixes large memory consumption in the other case 
[GH-40316](https://github.com/apache/arrow/issues/40316)
+* Slicing an array backwards beyond the start doesn't include first item 
([GH-38768](https://github.com/apache/arrow/issues/38768) and 
[GH-40642](https://github.com/apache/arrow/issues/40642))
+* Memory leaks when creating Arrow array from Python list of dicts is fixed 
[GH-37989](https://github.com/apache/arrow/issues/37989)
+* `FixedSizeListType` has not been considered as a nested type and is now 
added to `_NESTED_TYPES` 
[GH-40171](https://github.com/apache/arrow/issues/40171)
+* `max_chunksize` is now validated in `Table.to_batches` 
[GH-39788](https://github.com/apache/arrow/issues/39788)
+* Raising `ValueError` on `_ensure_partitioning`in Dataset is fixed 
[GH-39579](https://github.com/apache/arrow/issues/39579)
+
+* Python stacktrace is now attached to errors in `ConvertPyError` 
[GH-37164](https://github.com/apache/arrow/issues/37164)
+
+## R notes
+
+* Arrow IPC streams (i.e., `write_ipc_stream`) can now be written to socket
+connections ([GH-38897](https://github.com/apache/arrow/pull/38897))
+* The `print()` output for `Dataset` and `Table` objects has been improved so 
it
+now shows dimensions and truncates its output in the case of wide schemas
+([GH-38917](https://github.com/apache/arrow/pull/38917))
+* Various improvements and fixes to documentation, package build, and CI 
systems
+
+For more on what’s in the 16.0.0 R package, see the [R changelog][4].
+
+## Ruby and C GLib notes
+
+### Ruby
+
+- Added support for customizing timestamp parsers.
+  [GH-40590](https://github.com/apache/arrow/issues/40590)
+
+### C GLib
+
+- Added support for time zone in `GArrowTimestampDataType`.
+  [GH-39702](https://github.com/apache/arrow/issues/39702)
+- Added missing compute function options.
+  [GH-40402](https://github.com/apache/arrow/issues/40402)
+  - `GArrowSplitPatternOptions`
+  - `GArrowStrftimeOptions`
+  - `GArrowStrptimeOptions`
+  - `GArrowStructFieldOptions`
+- Changed documentation generator to GI-DocGen from GTK-Doc.
+  [GH-39935](https://github.com/apache/arrow/issues/39935)
+- Added `GArrowTimestampParser`.
+  [GH-40438](https://github.com/apache/arrow/issues/40438)
+- Added support for customizing timestamp parsers.
+  [GH-40590](https://github.com/apache/arrow/issues/40590)
+
+## Rust notes
+
+The Rust projects have moved to separate repositories outside the
+main Arrow monorepo. For notes on the latest release of the Rust
+implementation, see the latest [Arrow Rust changelog][5].
+
+[1]: https://github.com/apache/arrow/milestone/59?closed=1
+[2]: {{ site.baseurl }}/release/16.0.0.html#contributors
+[3]: {{ site.baseurl }}/release/16.0.0.html#changelog
+[4]: {{ site.baseurl }}/docs/r/news/
+[5]: https://github.com/apache/arrow-rs/tags

Reply via email to