This is an automated email from the ASF dual-hosted git repository.
kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-site.git
The following commit(s) were added to refs/heads/master by this push:
new 06412519eb Version 9.0.0 release blog post (#227)
06412519eb is described below
commit 06412519ebed0b74f6c8bc44532fc9293244edff
Author: Raúl Cumplido <[email protected]>
AuthorDate: Tue Aug 16 09:07:57 2022 +0200
Version 9.0.0 release blog post (#227)
Co-authored-by: David Li <[email protected]>
Co-authored-by: Eric Erhardt <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Matt Topol <[email protected]>
Co-authored-by: Dominik Moritz <[email protected]>
Co-authored-by: Larry White <[email protected]>
Co-authored-by: Neal Richardson <[email protected]>
Co-authored-by: Ian Cook <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Weston Pace <[email protected]>
---
_posts/2022-08-16-9.0.0-release.md | 311 +++++++++++++++++++++++++++++++++++++
1 file changed, 311 insertions(+)
diff --git a/_posts/2022-08-16-9.0.0-release.md
b/_posts/2022-08-16-9.0.0-release.md
new file mode 100644
index 0000000000..f69e3edfc2
--- /dev/null
+++ b/_posts/2022-08-16-9.0.0-release.md
@@ -0,0 +1,311 @@
+---
+layout: post
+title: "Apache Arrow 9.0.0 Release"
+date: "2022-08-16 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+The Apache Arrow team is pleased to announce the 9.0.0 release. This covers
+over 3 months of development work and includes [**509 resolved issues**][1]
+from [**114 distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bug fixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Community
+
+Since the 8.0.0 release, Dewey Dunnington, Alenka Frim and Rok Mihevc
+have been invited to be committers.
+Thanks for your contributions and participation in the project!
+
+## Columnar Format Notes
+
+## Arrow Flight RPC notes
+
+Arrow Flight is now available in MacOS M1 Python wheels
([ARROW-16779](https://issues.apache.org/jira/browse/ARROW-16779)).
+Arrow Flight SQL is now buildable on Windows
([ARROW-16902](https://issues.apache.org/jira/browse/ARROW-16902)).
+Ruby now exposes more of the Flight and Flight SQL APIs (various JIRAs).
+
+## Linux packages notes
+
+AlmaLinux 9 is now supported.
([ARROW-16745](https://issues.apache.org/jira/browse/ARROW-16745))
+
+AmazonLinux 2 aarch64 is now supported.
([ARROW-16477](https://issues.apache.org/jira/browse/ARROW-16477))
+
+## C++ notes
+
+STL-like iteration is now provided over chunked arrays
([ARROW-602](https://issues.apache.org/jira/browse/ARROW-602)).
+
+### Compute
+
+The C++ compute and execution engine is now officially named "Acero", though
+its C++ namespaces have not changed.
+
+New light-weight data holder abstractions have been introduced in order
+to reduce the overhead of invoking compute functions and kernels, especially
+at the small data sizes desirable for efficient parallelization (typically
+L1- or L2-sized). Specifically, the non-owning `ArraySpan` and `ExecSpan`
+structures have internally superseded the much heavier `ExecBatch`, which
+is still supported for compatibility at the API level
+([ARROW-16756](https://issues.apache.org/jira/browse/ARROW-16756),
[ARROW-16824](https://issues.apache.org/jira/browse/ARROW-16824),
[ARROW-16852](https://issues.apache.org/jira/browse/ARROW-16852)).
+
+In a similar vein, the `ValueDescr` class was removed and `ScalarKernel`
+implementations now always receive at least one non-scalar input, removing
+the special case where a `ScalarKernel` needs to output a scalar rather than
+an array. The higher-level compute APIs still allow executing a scalar function
+over all-scalar inputs; but those scalars are internally broadcasted to
+1-element arrays so as to simplify kernel implementation
([ARROW-16757](https://issues.apache.org/jira/browse/ARROW-16757)).
+
+Some performance improvements were made to the hash join node. These changes
+do not require additional configuration. The hash join exec node has been
+improved to more efficiently use CPU cache and make better use of available
+vectorization hardware
([ARROW-14182](https://issues.apache.org/jira/browse/ARROW-14182)).
+
+Some plans containing a sequence of hash join operators will now use bloom
+filters to eliminate rows earlier in the plan, reducing the overall CPU
+cost of the plan
([ARROW-15498](https://issues.apache.org/jira/browse/ARROW-15498)).
+
+Timestamp comparison is now supported
([ARROW-16425](https://issues.apache.org/jira/browse/ARROW-16425)).
+
+A cumulative sum function is implemented over numeric inputs
([ARROW-13530](https://issues.apache.org/jira/browse/ARROW-13530)). Note that
this is a vector
+function so cannot be used in an Acero ExecPlan.
+
+A rank vector kernel has been added
([ARROW-16234](https://issues.apache.org/jira/browse/ARROW-16234)).
+
+Temporal rounding functions received additional options to control how
+rounding is done
([ARROW-14821](https://issues.apache.org/jira/browse/ARROW-14821)).
+
+Improper computation of the "mode" function on boolean input was fixed
+([ARROW-17096](https://issues.apache.org/jira/browse/ARROW-17096)).
+
+Function registries can now be nested
([ARROW-16677](https://issues.apache.org/jira/browse/ARROW-16677)).
+
+### Dataset
+
+The `autogenerate_column_names` option for CSV reading is now handled correctly
+([ARROW-16436](https://issues.apache.org/jira/browse/ARROW-16436)).
+
+Fix `InMemoryDataset::ReplaceSchema` to actually replace the schema
+([ARROW-16085](https://issues.apache.org/jira/browse/ARROW-16085)).
+
+Fix `FilenamePartitioning` to properly support null values
([ARROW-16302](https://issues.apache.org/jira/browse/ARROW-16302)).
+
+### Filesystem
+
+A number of bug fixes and improvements were made to the Google Cloud Storage
+filesystem implementation
([ARROW-14892](https://issues.apache.org/jira/browse/ARROW-14892)).
+
+By default, the S3 filesystem implementation does not create or drop buckets
+anymore ([ARROW-15906](https://issues.apache.org/jira/browse/ARROW-15906)).
This is a compatibility-breaking change intended
+to prevent user errors from having potentially catastrophic consequences.
+Options have been added to restore the previous behavior if necessary.
+
+### Parquet
+
+The default Parquet version is now 2.4 for writing, enabling use of
+more recent logical types by default
([ARROW-12203](https://issues.apache.org/jira/browse/ARROW-12203)).
+
+Non-nullable fields are now handled correctly by the Parquet reader
+([ARROW-16116](https://issues.apache.org/jira/browse/ARROW-16116)).
+
+Reading encrypted files should now be thread-safe
([ARROW-14114](https://issues.apache.org/jira/browse/ARROW-14114)).
+
+Statistics equality now works correctly with minmax
([ARROW-16487](https://issues.apache.org/jira/browse/ARROW-16487)).
+
+The minimum Thrift version required for building is now 0.13
([ARROW-16721](https://issues.apache.org/jira/browse/ARROW-16721)).
+
+The Thrift deserialization limits can now be configured to accommodate for
+data files with very large metadata
([ARROW-16546](https://issues.apache.org/jira/browse/ARROW-16546)).
+
+### Substrait
+
+The Substrait spec has been updated to 0.6.0
([ARROW-16816](https://issues.apache.org/jira/browse/ARROW-16816)). In
addition, a
+larger subset of the Substrait specification is now supported
([ARROW-15587](https://issues.apache.org/jira/browse/ARROW-15587),
+[ARROW-15590](https://issues.apache.org/jira/browse/ARROW-15590),
+[ARROW-15901](https://issues.apache.org/jira/browse/ARROW-15901),
+[ARROW-16657](https://issues.apache.org/jira/browse/ARROW-16657),
+[ARROW-15591](https://issues.apache.org/jira/browse/ARROW-15591)).
+
+## C# notes
+
+#### New Features
+
+* Added support for Time32Array and Time64Array
([ARROW-16660](https://github.com/apache/arrow/pull/13279))
+
+#### Bug Fixes
+
+* When using TableFromRecordBatches, the resulting table columns have no data
array. ([ARROW-13129](https://github.com/apache/arrow/pull/10562))
+* Fix intermittent test failures due to async memory management bug.
([ARROW-16978](https://github.com/apache/arrow/pull/13573))
+
+## Go notes
+
+### Security
+
+* Updated testify dependency to address CVE-2022-28948.
([ARROW-16759](https://issues.apache.org/jira/browse/ARROW-16759)) (This was
also backported to previous versions and released as patch versions v6.0.2,
v7.0.1, and v8.0.1)
+
+### Arrow
+
+#### New Features
+
+* Dictionary Scalars are now available
([ARROW-16323](https://issues.apache.org/jira/browse/ARROW-16323))
+* Introduced a DictionaryUnifier object along with functions for unifying
Chunked Arrays and Tables
([ARROW-16324](https://issues.apache.org/jira/browse/ARROW-16324))
+* New CSV examples added to documentation to demonstrate error handling
([ARROW-16450](https://issues.apache.org/jira/browse/ARROW-16450))
+* CSV Reader now supports arrow.TimestampType
([ARROW-16504](https://issues.apache.org/jira/browse/ARROW-16504))
+* JSON parsing for Temporal Types now allow passing numeric values in addition
to strings for parsing. Timezones will be properly parsed if they exist in the
string and a function was added to retrieve a time.Location object from a
TimestampType ([ARROW-16551](https://issues.apache.org/jira/browse/ARROW-16551))
+* New utilities added to decimal128 for rescaling and easy conversion to and
from float32/float64
([ARROW-16552](https://issues.apache.org/jira/browse/ARROW-16552))
+* Arrow DataType interface now has a LayoutMethod which returns the physical
layout of the given datatype such as the number of buffers, types, etc. This
matches the behavior of the layout() methods in C++ for data types.
([ARROW-16556](https://issues.apache.org/jira/browse/ARROW-16556))
+* Added a SliceBuffer function to the memory package to allow better re-using
of memory across buffer objects
([ARROW-16557](https://issues.apache.org/jira/browse/ARROW-16557))
+* Dictionary Arrays can now be concatenated using array.Concatenate
([ARROW-17095](https://issues.apache.org/jira/browse/ARROW-17095))
+
+#### Bug Fixes
+
+* ipc.FileReader now properly uses the memory.Allocator interface
([ARROW-16002](https://issues.apache.org/jira/browse/ARROW-16002))
+* Addressed issue with Integration tests between Go and Java
([ARROW-16441](https://issues.apache.org/jira/browse/ARROW-16441))
+* RecordBuilder.UnmarshalJSON now properly ignores extra unknown fields rather
than panicking
([ARROW-16456](https://issues.apache.org/jira/browse/ARROW-16456))
+* StructBuilder.UnmarshalJSON will no longer fail and panic when Nullable
fields are missing
([ARROW-16502](https://issues.apache.org/jira/browse/ARROW-16502))
+* ipc.Reader no longer silently accepts string columns with invalid offsets,
preventing unexpected panics later when writing or accessing the resulting
arrays. ([ARROW-16831](https://issues.apache.org/jira/browse/ARROW-16831))
+* Arrow CSV reader no longer clobbers its reported errors and properly
surfaces them ([ARROW-16926](https://issues.apache.org/jira/browse/ARROW-16926))
+
+### Parquet
+
+#### New Features
+
+* The CreatedBy version string for the Parquet writer will now correctly
reflect the library version, and will be updated by the release scripts
([ARROW-16484](https://issues.apache.org/jira/browse/ARROW-16484))
+* Parquet bit_packing functions now have ARM64 NEON implementations for
performance ([ARROW-16486](https://issues.apache.org/jira/browse/ARROW-16486))
+* It is now possible to customize the root node in the Parquet writer instead
of hardcoding it to be named "schema" with a repetition type of Repeated. This
was needed to allow producing files similar to Apache Spark where the root node
has a repetition type of Required. It still defaults to the spec definition of
Repeated. ([ARROW-16561](https://issues.apache.org/jira/browse/ARROW-16561))
+* parquet_reader CLI mainprog has been enhanced to dump values out as JSON and
CSV along with setting an output file instead of just dumping to the terminal.
([ARROW-16934](https://issues.apache.org/jira/browse/ARROW-16934))
+
+#### Bug Fixes
+
+* Fixed a memory leak with Parquet page reading
([ARROW-16473](https://issues.apache.org/jira/browse/ARROW-16473))
+* Parquet Reader properly parallelizes column reads when the parallel option
is set to true.
([ARROW-16530](https://issues.apache.org/jira/browse/ARROW-16530))
+* Fixed bug in the Bool decoder for plain encoding
([ARROW-16563](https://issues.apache.org/jira/browse/ARROW-16563))
+* Fixed a bug in the Parquet bool column reader where it failed to properly
skip rows ([ARROW-16638](https://issues.apache.org/jira/browse/ARROW-16638))
+* Fixed the flakey travis ARM64 builds by reducing the size of a test case in
the pqarrow unit tests to reduce the memory usage for the tests.
([ARROW-16669](https://issues.apache.org/jira/browse/ARROW-16669))
+* Parquet writer now properly handles writing arrow.NULL type arrays
([ARROW-16749](https://issues.apache.org/jira/browse/ARROW-16749))
+* Column level dictionary encoding configuration for Parquet writing now
correctly respects the input value
([ARROW-16813](https://issues.apache.org/jira/browse/ARROW-16813))
+* Memory leak in DeltaByteArray encoding fixed
([ARROW-16983](https://issues.apache.org/jira/browse/ARROW-16983))
+
+
+## Java notes
+#### New Features
+* Allow overriding column nullability in arrow-jdbc
([#13558](https://github.com/apache/arrow/pull/13558))
+* Enable skip BOUNDS_CHECKING with setBytes and getBytes of ArrowBuf
([#13161](https://github.com/apache/arrow/pull/13161))
+* Initialize JNI components on use instead of statically
([#13146](https://github.com/apache/arrow/pull/13146))
+* Provide explicit JDBC column type mapping
([#13166](https://github.com/apache/arrow/pull/13166))
+* Allow duplicated field names in Java C data interface
([#13247](https://github.com/apache/arrow/pull/13247))
+* Improve and document StackTrace
([#12656](https://github.com/apache/arrow/pull/12656))
+* Keep more context when marshaling errors through JNI
([#13246](https://github.com/apache/arrow/pull/13246))
+* Make RoundingMode configurable to handle inconsistent scale in BigDecimals
([#13433](https://github.com/apache/arrow/pull/13433))
+* Improve Java dev experience with IntelliJ
([#13017](https://github.com/apache/arrow/pull/13017))
+* Implement ArrowArrayStream
([#13465](https://github.com/apache/arrow/pull/13465)))
+
+#### Bug Fixes
+* Fix variable-width vectors in integration JSON writer
([#13676](https://github.com/apache/arrow/pull/13676))
+* Handle empty JDBC ResultSet
([#13049](https://github.com/apache/arrow/pull/13049))
+* Fix hasNext() in ArrowVectorIterator
([#13107](https://github.com/apache/arrow/pull/13107))
+* Fix ArrayConsumer when using ArrowVectorIterator
([#12692](https://github.com/apache/arrow/pull/12692))
+* Update Gandiva Protobuf library to enable builds on Apple M1
([#13121](https://github.com/apache/arrow/pull/13121))
+* Patch dataset module testing failure with JSE11+
([#13200](https://github.com/apache/arrow/pull/13200))
+* Don't duplicate generated Protobuf classes between flight-core and
flight-sql ([#13596](https://github.com/apache/arrow/pull/13596))
+
+## JavaScript notes
+
+* Fix error iterating tables with no batches
([ARROW-16371](https://issues.apache.org/jira/browse/ARROW-16371))
+* Handle case where `tableFromIPC` input is an async `RecordBatchReader`
([ARROW-16704](https://issues.apache.org/jira/browse/ARROW-16704))
+
+## Python notes
+
+Compatibility notes:
+
+* PyArrow now requires Python >= 3.7
([ARROW-16474](https://issues.apache.org/jira/browse/ARROW-16474)).
+
+* The default behaviour regarding memory mapping has changed in several APIs
(reading of Feather or Parquet files, IPC RecordBatchFileReader and
RecordBatchStreamReader) to disable memory mapping by default
([ARROW-16382](https://issues.apache.org/jira/browse/ARROW-16382)).
+
+* The default Parquet version is now 2.4 for writing, enabling use of
+more recent logical types by default such as unsigned integers
([ARROW-12203](https://issues.apache.org/jira/browse/ARROW-12203)). One can
specify `version="2.6"` to also enable support for nanosecond timestamps. Use
`version="1.0"` to restore the old behaviour and maximizes file compatibility.
+
+* Some deprecated APIs (deprecated at least since pyarrow 1.0.0) have been
removed: IPC methods in the top-level namespace, the `Value` scalar classes and
the `pyarrow.compat` module
([ARROW-17010](https://issues.apache.org/jira/browse/ARROW-17010)).
+
+New features:
+
+* Google Cloud Storage (GCS) File System support is now available in the
Python bindings
([ARROW-14892](https://issues.apache.org/jira/browse/ARROW-14892)).
+
+* The `Table.filter()` method now supports passing an expression in addition
to a boolean array
([ARROW-16469](https://issues.apache.org/jira/browse/ARROW-16469)).
+
+* When implementing extension types in Python, it is now possible to also
customize which Python scalar gets returned (in `Array.to_pylist()` or
`Scalar.as_py()`) by subclassing `ExtensionScalar`
([ARROW-13612](https://issues.apache.org/jira/browse/ARROW-13612),
([ARROW-17065](https://issues.apache.org/jira/browse/ARROW-17065))).
+
+* It is now possible to register User Defined Functions (UDF) for scalar
functions using `register_scalar_function`
([ARROW-15639](https://issues.apache.org/jira/browse/ARROW-15639)).
+
+* Basic support for consuming a Substrait plan has been exposed in Python as
`pyarrow.substrait.run_query`
([ARROW-15779](https://issues.apache.org/jira/browse/ARROW-15779)).
+
+* The `cast` method and compute kernel now exposes the fine grained options in
addition to safe/unsafe casting
([ARROW-15365](https://issues.apache.org/jira/browse/ARROW-15365)).
+
+In addition, this release includes several bug fixes and documention
improvements (such as expanded examples in docstrings
([ARROW-16091](https://issues.apache.org/jira/browse/ARROW-16091))).
+
+Further, the Python bindings benefit from improvements in the C++ library
+(e.g. new compute functions); see the C++ notes above for additional details.
+
+
+## R notes
+
+Highlights include several new `dplyr` verbs, including `glimpse()` and
`union_all()`, as well as many more datetime functions from `lubridate`. There
is also experimental support for user-defined scalar functions in the query
engine, and most packages include native support for datasets in Google Cloud
Storage (opt-in in the Linux full source build).
+
+For more on what’s in the 9.0.0 R package, see the [R changelog][4].
+
+## Ruby and C GLib notes
+
+FlightSQL is now supported but there are minimum features for now.
+
+More Flight features are now supported.
+
+### Ruby
+
+`Enumerable` compatible methods such as `#min` and `#max` on `Arrow::Array`,
`Arrow::ChunkedArray` and `Arrow::Column` are implemented by C++'s [compute
functions]({{ site.baseurl }}/docs/cpp/compute.html). This improves
performance. ([ARROW-15222](https://issues.apache.org/jira/browse/ARROW-15222))
+
+This release fixed some memory leaks.
([ARROW-14790](https://issues.apache.org/jira/browse/ARROW-14790))
+
+This release improved support for interval type arrays such as
`Arrow::MonthIntervalArray`.
([ARROW-16206](https://issues.apache.org/jira/browse/ARROW-16206))
+
+This release improved auto data type conversion.
([ARROW-16874](https://issues.apache.org/jira/browse/ARROW-16874))
+
+### C GLib
+
+Vala is now supported.
([ARROW-15671](https://issues.apache.org/jira/browse/ARROW-15671)). See
[`c_glib/example/vala/`](https://github.com/apache/arrow/tree/apache-arrow-9.0.0/c_glib/example/vala)
for examples.
+
+`GArrowQuantil
+eOptions` is added.
([ARROW-16623](https://issues.apache.org/jira/browse/ARROW-16623))
+
+## Rust notes
+
+The Rust projects have moved to separate repositories outside the
+main Arrow monorepo. For notes on the 19.0.0 release of the Rust
+implementation, see the [Arrow Rust changelog][5].
+
+[1]:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%209.0.0
+[2]: {{ site.baseurl }}/release/9.0.0.html#contributors
+[3]: {{ site.baseurl }}/release/9.0.0.html#changelog
+[4]: {{ site.baseurl }}/docs/r/news/
+[5]: https://github.com/apache/arrow-rs/blob/19.0.0/CHANGELOG.md