Re: [PR] Website: Add blog post for 18.0.0 [arrow-site]

via GitHub Sat, 19 Oct 2024 11:11:44 -0700


jonkeane commented on code in PR #547:
URL: https://github.com/apache/arrow-site/pull/547#discussion_r1807438589



##########
_posts/2024-10-16-18.0.0-release.md:
##########
@@ -0,0 +1,195 @@
+---
+layout: post
+title: "Apache Arrow 18.0.0 Release"
+date: "2024-10-16 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+The Apache Arrow team is pleased to announce the 18.0.0 release. This covers
+over 3 months of development work and includes [**XXX resolved issues**][1]
+on [**YYY distinct commits**][2] from [**ZZZ distinct contributors**][2].
+See the [Install Page](https://arrow.apache.org/install/)
+to learn how to get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Community
+
+Since the 17.0.0 release, JJJJJ has been invited to be committer.
+No new members have joined the Project Management Committee (PMC).
+
+Thanks for your contributions and participation in the project!
+
+## Columnar format
+
+The Arrow columnar format now allows 32-bit and 64-bit decimal data, in
+addition to the already existing 128-bit and 256-bit decimal data types
+(GH-43956).
+
+## Linux packages notes
+
+Azure file system is enabled.
+
+## C Data Interface notes
+
+
+## Arrow Flight RPC notes
+
+**Flight UCX is deprecated.** We plan to remove this experiment in the next 
couple of releases.
+
+The Java implementation now transparently handles compressed Arrow data when 
reading, instead of requiring explicit configuration. (GH-43469)
+
+The Ruby bindings now support implementing DoPut on the server. (GH-43814)
+
+## C++ notes
+
+The default memory pool has changed to mimalloc on all platforms (GH-43254).
+Previously, jemalloc was used by default on Linux. Using mimalloc by default
+provides a more consistent experience accross different platforms, and
+makes configuration easier. It is expected that this might either increase
+or decrease performance on user workloads that use the default memory pool;
+please benchmark accordingly. Jemalloc can still be selected by setting
+the `ARROW_DEFAULT_MEMORY_POOL` environment variable to "jemalloc".
+
+A new class `arrow::ArrayStatistics` has been added to encode basic statistics
+about an Arrow array. It provides a source-agnostic representation for 
statistics
+provided by third-party sources such as Parquet files (GH-41909).
+
+The new Decimal32 and Decimal64 types have been made available (GH-43956).
+
+Several canonical extension types have been implemented:
+- the 
[Opaque](https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#opaque)
 extension type (GH-43454);
+- the [8-bit 
boolean](https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#bit-boolean)
 extension type (GH-17682);
+- the 
[UUID](https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#uuid) 
extension type (GH-15058);
+- the 
[JSON](https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#json) 
extension type (GH-32538).
+
+### Acero
+
+- Enhanced the row-oriented representation by widening the offset type from 
32-bit to 64-bit, resolving crashes and data corruption in aggregation and hash 
join on large datasets due to offset overflow (GH-43495).
+- Improved ordered aggregation performance by reducing complexity from 
`O(n*m)` to `O(n)`, where `n` is the number of rows and `m` the number of 
segments in the batch (GH-44052).
+
+### Compute
+
+Casting between string-like and string-view-like types has been implemented 
(GH-42247).
+
+### Dataset
+
+
+### Filesystems
+
+Writing small files to S3 now uses a single S3 API call instead of three
+(GH-40557). Files larger than 5 MB still go through the regular multipart
+upload mechanism.
+
+Background writes are now implemented and enabled by default for the Azure
+filesystem, dramatically improving the performance of writing to remote files
+(GH-40036).
+
+Finalization of the S3 filesystem layer should hopefully be more robust 
(GH-44071).
+
+### Gandiva
+
+LLVM 19.1 is now supported (GH-44222).
+
+### GPU
+
+
+### IPC
+
+The seed corpus used for fuzzing the IPC reader has been improved, hopefully
+helping make the IPC reader even more robust against corrupt or malicious
+IPC streams (GH-38041).
+
+### Parquet
+
+A new command line utility `parquet-dump-footer` allows dumping the 
Thrift-encoded
+footer metadata of a Parquet file, optionally scrubbing confidential data
+(GH-42102). This is part of the effort to collect real-world Parquet metadata
+so as to evaluate the efficiency of future improvements to the Parquet format.
+Please see https://github.com/apache/parquet-benchmark for instructions to 
submit
+footers representative of your own workloads.
+
+### Substrait
+
+
+## C# notes
+
+- Partial support has been added for LargeBinary, LargeString and LargeList. 
The column sizes cannot exceed 2 GB in length. (GH-43266).
+- Changes to Flight support were made for better control and compatibility, 
and to allow Flight Server to be hosted in pre-Kestrel versions of .NET 
(GH-43907, GH-43672, GH-41347).
+- Support has been added for newly-defined types decimal32 and decimal64 
(GH-44271).
+- The import of sliced arrays through the C Data interface now works 
correctly. (GH-43267)
+## Java notes
+
+**Java 8 is no longer supported.** (GH-38051)
+
+**Gandiva may not work in this release.** For details, please see 
[GH-43576](https://github.com/apache/arrow/issues/43576).
+
+Basic support for RunEndEncoded was added (GH-39982). The ListView/StringView 
vector implementations are now more complete, including C Data support 
(multiple issues).
+
+Several APIs have been updated to accept `long` for addresses in preparation 
for FFM/large buffer support (GH-43902). We no longer expose `sun.misc.Unsafe` 
(GH-43479). We no longer ship the `shaded` flight-core JARs (GH-43217).
+
+More options were added to the Dataset ScanOptions API (GH-28866).
+
+## JavaScript notes
+
+
+## Python notes
+
+
+## R notes
+
+For more on what’s in the 18.0.0 R package, see the [R changelog][4].

Review Comment:
   ```suggestion
   ## R notes
   
   * R functions that users write that use functions that Arrow supports in 
dataset
     queries now can be used in queries too. Previously, only functions that 
used
     arithmetic operators worked.
     For example, `time_hours <- function(mins) mins / 60` worked,
     but `time_hours_rounded <- function(mins) round(mins / 60)` did not;
     now both work. These are automatic translations rather than true 
user-defined
     functions (UDFs); for UDFs, see `register_scalar_function()`. 
[GH-41223](https://github.com/apache/arrow/issues/41223)
   * `mutate()` expressions can now include aggregations, such as `x - 
mean(x)`. [GH-41350](https://github.com/apache/arrow/issues/41350)
   * `summarize()` supports more complex expressions, and correctly handles 
cases
     where column names are reused in expressions. 
[GH-41223](https://github.com/apache/arrow/issues/41223)
   * The `na_matches` argument to the `dplyr::*_join()` functions is now 
supported.
     This argument controls whether `NA` values are considered equal when 
joining. [GH-41358](https://github.com/apache/arrow/issues/41358)
   * R metadata, stored in the Arrow schema to support round-tripping data 
between
     R and Arrow/Parquet, is now serialized and deserialized more strictly.
     This makes it safer to load data from files from unknown sources into R 
data.frames. [GH-41969](https://github.com/apache/arrow/issues/41969)
   * Turn on the S3 and ZSTD features by default for macOS. 
[GH-42210](https://github.com/apache/arrow/issues/42210)
   * Fix a bug in our implementation of `pull` on grouped datasets, it now
     returns the expected column. 
[GH-43172](https://github.com/apache/arrow/issues/43172)
     
   For full details of what’s in the 18.0.0 R package, see the [R changelog][4].
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Website: Add blog post for 18.0.0 [arrow-site]

Reply via email to