pitrou commented on code in PR #500: URL: https://github.com/apache/arrow-site/pull/500#discussion_r1601703998
########## _posts/2024-04-20-16.0.0-release.md: ########## @@ -0,0 +1,299 @@ +--- +layout: post +title: "Apache Arrow 16.0.0 Release" +date: "2024-04-20 00:00:00" +author: pmc +categories: [release] +--- +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + + +The Apache Arrow team is pleased to announce the 16.0.0 release. This covers +over 3 months of development work and includes [**385 resolved issues**][1] +on [**586 distinct commits**][2] from [**119 distinct contributors**][2]. +See the [Install Page](https://arrow.apache.org/install/) +to learn how to get the libraries for your platform. + +The release notes below are not exhaustive and only expose selected highlights +of the release. Many other bugfixes and improvements have been made: we refer +you to the [complete changelog][3]. + +## Community + +Since the 15.0.0 release, Jeffrey Vo, Jay Zhan, Bryce Mecum, Joel Lubinitsky, +and Sarah Gilmore have been invited to be committers. +No new members have joined the Project Management Committee (PMC). + +Thanks for your contributions and participation in the project! + +## C Data Interface notes + +- Added `RegisterDeviceMemoryManager` and `GetDeviceMemoryManage` for managing mappings between a device type and id to a memory manager ([GH-40698](https://github.com/apache/arrow/issues/40698)). +- Added `RegisterCUDADevice` to register CUDA devices ([GH-40698](https://github.com/apache/arrow/issues/40698)). +- Added `ImportFromChunkedArray` and `ExportChunkedArray` for handling Chunked Arrays in the C Stream Interface ([GH-38717](https://github.com/apache/arrow/issues/38717)). +- Fixed an issue where string and nested types weren’t being correctly imported with DeviceArray ([GH-39769](https://github.com/apache/arrow/issues/39769)). +- Added support for copying Arrays and RecordBatches between memory types ([GH-39771](https://github.com/apache/arrow/issues/39771)). + +## Arrow Flight RPC notes + +- Session variable RPCs were added ([GH-34865](https://github.com/apache/arrow/issues/34865)) +- Go: cookies can be copied to another connection to reuse existing credentials ([GH-39837](https://github.com/apache/arrow/issues/39837)) +- Go: enable PollFlightInfo for Flight SQL clients/servers ([GH-39574](https://github.com/apache/arrow/issues/39574)) +- Java: the JDBC driver now tries all locations the server sends it ([GH-38573](https://github.com/apache/arrow/issues/38573)) +- Java: tweak some options to give better performance ([GH-40475](https://github.com/apache/arrow/issues/40745), [GH-40039](https://github.com/apache/arrow/issues/40039)) + +## C++ notes +For C++ notes refer to the full changelog. + +## Highlights + +- Initial support for the Azure Blob Storage has been added ([GH-18014](https://github.com/apache/arrow/issues/18014)). +- Arrow C++ can now be built with Emscripten ([GH-37821](https://github.com/apache/arrow/pull/37821)) which lays the foundation for running Arrow C++ under WASM runtimes and eventually [PyArrow](https://github.com/apache/arrow/pull/37822) as well. +- Arrow's filesystem modules have been separated out into individual libraries and this change enables writing and registering custom filesystem implementations ([GH-38309](https://github.com/apache/arrow/issues/38309)). +- Conversion from `Table` and `RecordBatch` to a `Tensor` (not the same as +[tensor extension array](https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#official-list)) +is being developed. Umbrella issue is created ([GH-40058](https://github.com/apache/arrow/issues/40058)) +and issues connected to the `RecordBatch` conversion are included in this release +([GH-40059](https://github.com/apache/arrow/issues/40059), +[GH-40357](https://github.com/apache/arrow/issues/40357), +[GH-40297](https://github.com/apache/arrow/issues/40297), +[GH-40060](https://github.com/apache/arrow/issues/40060), +[GH-40061](https://github.com/apache/arrow/issues/40061) and +[GH-40866](https://github.com/apache/arrow/issues/40866)) which means `RecordBatch` can now be +converted to a column or row-major two-dimensional structure. + +## Breaking Changes + +- `Function::is_impure` has been renamed to `is_pure` ([GH-40607](https://github.com/apache/arrow/issues/40607)). + +## Compute + +### Bug Fixes + +- Fixed a potential crash when accessing the `true_count` property on a BooleanArray ([GH-41016](https://github.com/apache/arrow/issues/41016)). + +### Performance improvements + +- Significantly improved performance of the take kernel on certain types of inputs ([GH-40207](https://github.com/apache/arrow/issues/40207)). + +### Enhancements + +- Support for casting to and from half-float (float16) has been added ([GH-20213](https://github.com/apache/arrow/issues/20213)). +- Added support for residual predicates to Swiss Join implementation ([GH-20339](https://github.com/apache/arrow/issues/20339)). +- Expanded support to primitive filter implementation for all fixed-width primitive types and take filter implementation for all well-known fixed-width types ([GH-39740](https://github.com/apache/arrow/issues/39740)). +- Added support for calling the `binary_slice` kernel on Fixed-Size Binary Arrays ([GH-39231](https://github.com/apache/arrow/issues/39231)). +- The cast kernel now supports casting from LargeString, Binary, and LargeBinary to Dictionary ([GH-39463](https://github.com/apache/arrow/issues/39463)). +- Fields of different decimal precision can now be used together in arithmetic operations without an explicit cast beforehand. ([GH-40126](https://github.com/apache/arrow/issues/40126)). + +## Datasets + +- Improved backpressure handling in the Dataset Writer which can significantly reduce memory usage for some use cases ([https://github.com/apache/arrow/pull/40722](https://github.com/apache/arrow/pull/40722)). + +## Parquet + +- Byte stream split encoding support has been added for FIXED_LEN_BYTE_ARRAY, INT32, and INT64 which enables this encoding for half-float (float16) and fixed-width decimal ([GH-39978](https://github.com/apache/arrow/issues/39978)). +- Decoding boolean values has been made faster for a variety of cases ([GH-40872](https://github.com/apache/arrow/issues/40872)). + +## Filesystems + +### New Features + +- In addition to building the individual filesystem implementations as separate modules, users can now write and register custom filesystem implementations ([GH-38309](https://github.com/apache/arrow/issues/38309)). +- A new environment variable, `AWS_ENDPOINT_URL_S3`, has been added which allows separately overriding the endpoint for S3 operations alone ([GH-38663](https://github.com/apache/arrow/issues/38663)). + +### Bug Fixes + +- Fixed a bug in the S3 filesystem implementation that could cause a crash when deleting an object having duplicate forward slashes in its name ([GH-38821](https://github.com/apache/arrow/issues/38821)). +- Fixed a bug where `hash_mean` could silently overflow ([GH-38833](https://github.com/apache/arrow/issues/38833)). + +### Improvements + +- The S3 implementation now sets the content-type of directory-like objects to application/x-directory to improve compatibility with other S3 tools ([GH-38794](https://github.com/apache/arrow/issues/38794)). +- Repeated S3Client initialization is now roughly an order of magnitude faster ([GH-40299](https://github.com/apache/arrow/pull/40299)). +- The MemoryPoolStats implementation has been reworked to re-order loads and stores which may be an improvement for some allocation-heavy, multi-threaded applications ([GH-40783](https://github.com/apache/arrow/issues/40783)). + +### Substrait Review Comment: "Substrait" should not be under "Filesystems" but at the same level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org