amoeba commented on code in PR #500: URL: https://github.com/apache/arrow-site/pull/500#discussion_r1581706506
########## _posts/2024-04-20-16.0.0-release.md: ########## @@ -0,0 +1,109 @@ +--- +layout: post +title: "Apache Arrow 16.0.0 Release" +date: "2024-04-20 00:00:00" +author: pmc +categories: [release] +--- +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + + +The Apache Arrow team is pleased to announce the 16.0.0 release. This covers +over 3 months of development work and includes [**385 resolved issues**][1] +on [**586 distinct commits**][2] from [**119 distinct contributors**][2]. +See the [Install Page](https://arrow.apache.org/install/) +to learn how to get the libraries for your platform. + +The release notes below are not exhaustive and only expose selected highlights +of the release. Many other bugfixes and improvements have been made: we refer +you to the [complete changelog][3]. + +## Community + +Since the 15.0.0 release, Jeffrey Vo, Jay Zhan, Bryce Mecum and Sarah Gilmore +have been invited to be committers. +No new members have joined the Project Management Committee (PMC). + +Thanks for your contributions and participation in the project! + +## C Data Interface notes + + +## Arrow Flight RPC notes + + +## C++ notes + Review Comment: I worked through this quickly today but (1) need to finish three todos at the top and (2) proof-read it before we merge. I'll do that tomorrow. ```suggestion For C++ notes refer to the full changelog. ## Highlights - Initial support for the Azure Blob Storage has been added ([GH-18014](https://github.com/apache/arrow/issues/18014)) - Arrow C++ can now be built with Emscripten ([GH-37821](https://github.com/apache/arrow/pull/37821)) which lays the foundation for running Arrow C++ under WASM runtimes and eventually [PyArrow](https://github.com/apache/arrow/pull/37822) - Arrow's filesystem modules have been separated out into individual libraries and this change enables writing and registering custom filesystem implementations ([GH-38309](https://github.com/apache/arrow/issues/38309)) ## Breaking Changes - Function::is_impure has been renamed to is_pure (GH-40607). ## Compute ### Bug Fixes - Fixed a potential crash when accessing the true_count property on a BooleanArray ([GH-41016](https://github.com/apache/arrow/issues/41016)). ### Performance improvements Significantly improved performance of take kernel on certain types of inputs ([GH-40207](https://github.com/apache/arrow/issues/40207)). ### Enhancements - Casting to and from half-float (float16) has been added ([GH-20213](https://github.com/apache/arrow/issues/20213)). - Added support for residual predicates to Swiss Join implementation ([GH-20339](https://github.com/apache/arrow/issues/20339)). - Expanded support to primitive filter implementation for all fixed-width primitive types and take filter implementation for all well-known fixed-width types ([GH-39740](https://github.com/apache/arrow/issues/39740)). - Added support for calling binary_slice kernel on Fixed-Size Binary Arrays ([GH-39231](https://github.com/apache/arrow/issues/39231)). - The cast kernel now supports casting from LargeString, Binary, and LargeBinary to Dictionary ([GH-39463](https://github.com/apache/arrow/issues/39463)). - Fields of different decimal precision can now be used with arithmetic operations without an explicit cast beforehand. ([GH-40126](https://github.com/apache/arrow/issues/40126)). ## C Data Interface - Added RegisterDeviceMemoryManager, GetDeviceMemoryManage for managing mappings between a device type and id to a memory manage ([GH-40698](https://github.com/apache/arrow/issues/40698)). - Added RegisterCUDADevice to register CUDA devices ([GH-40698](https://github.com/apache/arrow/issues/40698)). - Added ImportFromChunkedArray and ExportChunkedArray for handling ChunkedArrays in the C Stream Interface ([GH-38717](https://github.com/apache/arrow/issues/38717)). - Fixed an issue where string and nested types weren’t being correctly imported with DeviceArray ([GH-39769](https://github.com/apache/arrow/issues/39769)). - Added support for copying Arrays and RecordBatches between memory types ([GH-39771](https://github.com/apache/arrow/issues/39771)). ## Datasets - Improved backpressure handling in the Dataset Writer which can significantly reduce memory usage ([https://github.com/apache/arrow/pull/40722](https://github.com/apache/arrow/pull/40722)). ## Parquet - Byte stream split encoding support has been added for FIXED_LEN_BYTE_ARRAY, INT32, and INT64 which enables this encoding for half-float (float16) and fixed-width decimal ([GH-39978](https://github.com/apache/arrow/issues/39978)). - Decoding boolean values has been made faster for a variety of cases ([GH-40872](https://github.com/apache/arrow/issues/40872)). ## Filesystems ### New Features - In addition to building the individual filesystem implementations as separate modules, users can now write and register custom filesystem implementations ([GH-38309](https://github.com/apache/arrow/issues/38309)). - A new environment variable, AWS_ENDPOINT_URL_S3, has been added which allows separately overriding the endpoint for S3 operations ([GH-38663](https://github.com/apache/arrow/issues/38663)). ### Bug Fixes - Fixed a bug in the S3 filesystem implementation that could cause a crash when deleting an object having duplicate forward slashes in its name ([GH-38821](https://github.com/apache/arrow/issues/38821)). - Fixed a bug where hash_mean could silently overflow ([GH-38833](https://github.com/apache/arrow/issues/38833)). ### Improvements - The S3 implementation now sets the content-type of directory-like objects to application/x-directory to improve compatibility with other S3 tools ([GH-38794](https://github.com/apache/arrow/issues/38794)). - Repeated S3Client initialization is now roughly an order of magnitude faster ([GH-40299](https://github.com/apache/arrow/pull/40299)). - The MemoryPoolStats implementation has been improved and it may be faster in multi-threaded applications on certain hardware ([GH-40783](https://github.com/apache/arrow/issues/40783)). ### Substrait - Support has been added to Substrait for a variety of Arrow types ([GH-40695](https://github.com/apache/arrow/issues/40695)). - Substrait has been upgraded to 0.44 ([GH-40695](https://github.com/apache/arrow/issues/40695)). ## Development - Added support the mold and lld linkers for building Arrow C++ ([GH-40394](https://github.com/apache/arrow/issues/40394), [GH-40400](https://github.com/apache/arrow/issues/40400)). ### Miscellaneous - Upgraded ORC to 2.0.0 ([GH-40507](https://github.com/apache/arrow/issues/40507)). - Upgraded zstd to 1.5.6 ([GH-40837](https://github.com/apache/arrow/pull/40837)). - Upgraded google benchmark to 1.8.3 ([GH-39863](https://github.com/apache/arrow/issues/39863)). - Upgraded zlib 1.3.1 ([GH-39876](https://github.com/apache/arrow/issues/39876)). - Various ToString methods now support an optional show_metadata argument which will print metadata that may exist in nested types. ([GH-39864](https://github.com/apache/arrow/issues/39864)). ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org