[GitHub] [arrow-site] AlenkaF commented on a diff in pull request #382: [Website] Version 13.0.0 blog post

via GitHub Mon, 17 Jul 2023 21:19:49 -0700


AlenkaF commented on code in PR #382:
URL: https://github.com/apache/arrow-site/pull/382#discussion_r1266192895



##########
_posts/2023-07-17-13.0.0-release.md:
##########
@@ -0,0 +1,215 @@
+---
+layout: post
+title: "Apache Arrow 13.0.0 Release"
+date: "2023-07-17 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+The Apache Arrow team is pleased to announce the 13.0.0 release. This covers
+over 3 months of development work and includes [**XXX resolved issues**][1]
+from [**YYY distinct contributors**][2]. See the [Install 
Page](https://arrow.apache.org/install/)
+to learn how to get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Community
+
+Since the 12.0.0 release, Marco Neumann, Gang Wu, Mehmet Ozan Kabak and Kevin 
Gurney
+have been invited to be committers.
+Matt Topol, Jie Wen, Ben Baumgold and Dewey Dunnington have joined the
+Project Management Committee (PMC).
+
+Thanks for your contributions and participation in the project!
+
+## Columnar Format Notes
+
+### C Device Data Interface
+
+An **experimental** new specification, the
+[C Device Data 
Interface](https://arrow.apache.org/docs/dev/format/CDeviceDataInterface.html),
+has been accepted for inclusion (GH-34971). It builds on the existing
+C Data Interface to provide a runtime-agnostic zero-copy sharing mechanism
+for Arrow data residing on non-CPU devices.
+
+Reference implementations of the C Device Data Interface will progressively
+be added to the standard Arrow libraries after the 13.0.0 release.
+
+## Arrow Flight RPC notes
+
+## C++ notes
+
+### Building
+
+CMake 3.16 or later is now required for building Arrow C++ (GH-34921).
+
+Optimizations are not disabled anymore when the `RelWithDebInfo` build type
+is selected (GH-35850). Furthermore, compiler flags can now properly be
+customized per-build type using `ARROW_C_FLAGS_DEBUG`, `ARROW_CXX_FLAGS_DEBUG`
+and related variables (GH-35870).
+
+### Acero
+
+Handling of unaligned buffers is input nodes can be configured programmatically
+or by setting the environment variable `ACERO_ALIGNMENT_HANDLING`. The default
+behavior is to warn when an unaligned buffer is detected (GH-35498).
+
+### Compute
+
+Several new functions have been added:
+* aggregate functions "first", "last", "first_last" (GH-34911);
+* vector functions "cumulative_prod", "cumulative_min", "cumulative_max" 
(GH-32190);
+* vector function "pairwise_diff" (GH-35786).
+
+Sorting now works on dictionary arrays, with a much better performance than
+the naive approach of sorting the decoded dictionary (GH-29887). Sorting also
+works on struct arrays, and nested sort keys are supported using `FieldRed` 
(GH-33206).
+
+The `check_overflow` option has been removed from `CumulativeSumOptions` as
+it was redundant with the availability of two different functions:
+"cumulative_sum" and "cumulative_sum_checked" (GH-35789).
+
+Run-end encoded filters are efficiently supported (GH-35749).
+
+Duration types are supported with the "is_in" and "index_in" functions 
(GH-36047).
+They can be multiplied with all integer types (GH-36128).
+
+"is_in" and "index_in" now cast their inputs more flexibly: they first attempt
+to cast the value set to the input type, then in the other direction if the
+former fails (GH-36203).
+
+Multiple bugs have been fixed in "utf8_slice_codeunits" when the `stop` option
+is omitted (GH-36311).
+
+### Dataset
+
+A custom schema can now be passed when writing a dataset (GH-35730). The custom
+schema can alter nullability or metadata information, but is not allowed to
+change the datatypes written.
+
+### Filesystems
+
+The S3 filesystem now writes files in equal-sized chunks, for compatibility 
with
+Cloudflare's "R2" Storage (GH-34363).
+
+A long-standing issue where S3 support could crash at shutdown because of 
resources
+still being alive after S3 finalization has been fixed (GH-36346). Now, 
attempts
+to use S3 resources (such as making filesystem calls) after S3 finalization 
should
+result in a clean error.
+
+The GCS filesystem accepts a new option to set the project id (GH-36227).
+
+### IPC
+
+Nullability and metadata information for sub-fields of map types is now 
preserved
+when deserializing Arrow IPC (GH-35297).
+
+### Orc
+
+The Orc adapter now maps Arrow field metadata to Orc type attributes when 
writing,
+and vice-versa when reading (GH-35304).
+
+### Parquet
+
+It is now possible to write additional metadata while a `ParquetFileWriter` is
+open (GH-34888).
+
+Writing a page index can be enabled selectively per-column (GH-34949).
+In addition, page header statistics are not written anymore if the page
+index is enabled for the given column (GH-34375), as the information would
+be redundant and less efficiently accessed.
+
+Parquet writer properties allow specifying the sorting columns (GH-35331).
+The user is responsible for ensuring that the data written to the file
+actually complies with the given sorting.
+
+CRC computation has been implemented for v2 data pages (GH-35171).
+It was already implemented for v1 data pages.
+
+Writing compliant nested types is now enabled by default (GH-29781). This
+should not have any negative implication.
+
+Attempting to load a subset of an Arrow extension type is now forbidden
+(GH-20385). Previously, if an extension type's storage is nested (for example
+a "Point" extension type backed by a `struct<x: float64, y: float64>`),
+it was possible to load selectively some of the columns of the storage type.
+
+### Substrait
+
+Support for various functions has been added: "stddev", "variance", "first",
+"last" (GH-35247, GH-35506).
+
+Deserializing sorts is now supported (GH-32763). However, some features,
+such as clustered sort direction or custom sort functions, are not
+implemented.
+
+### Miscellaneous
+
+`FieldRef` sports additional methods to get a flattened version of nested
+fields (GH-14946). Compared to their non-flattened counterparts,
+the methods `GetFlattened`, `GetAllFlattened`, `GetOneFlattened` and
+`GetOneOrNoneFlattened` combine a child's null bitmap with its ancestors'
+null bitmaps such as to compute the field's overall logical validity bitmap.
+
+In other words, given the struct array `[null, {'x': null}, {'x': 5}]`,
+`FieldRef("x")::Get` might return `[0, null, 5]`
+while `FieldRef("y")::GetFlattened` will *always* return `[null, null, 5]`.
+
+`Scalar::hash()` has been fixed for sliced nested arrays (GH-35360).
+
+A new floating-point to decimal conversion algorithm exhibits much better
+precision (GH-35576).
+
+It is now possible to cast between scalars of different list-like types
+(GH-36309).
+
+## C# notes
+
+## Go notes
+
+## Java notes
+
+## JavaScript notes
+
+## Python notes

Review Comment:
   ```suggestion
   ## Python notes
   
   Compatibility notes:
   
   * The default format version for Parquet has been bumped from 2.4 to 2.6 
[GH-35746](https://github.com/apache/arrow/issues/35746)
   * Support for Python 3.7 is dropped 
[GH-34788](https://github.com/apache/arrow/issues/34788)
   
   New features:
   
   * Conversion to non-nano datetime64 for pandas >= 2.0 is now supported 
[GH-33321](https://github.com/apache/arrow/issues/33321)
   * Write page index is now supported 
[GH-36284](https://github.com/apache/arrow/issues/36284)
   * Bindings for reading JSON format in Dataset are added 
[GH-34216](https://github.com/apache/arrow/issues/34216)
   * `keys_sorted` property of MapType is now exposed 
[GH-35112](https://github.com/apache/arrow/issues/35112)
   
   Other improvements:
   
   * Common python functionality between `Table` and `RecordBatch` classes has 
been consolidated ( [GH-36129](https://github.com/apache/arrow/issues/36129), 
[GH-35415](https://github.com/apache/arrow/issues/35415), 
[GH-35390](https://github.com/apache/arrow/issues/35390), 
[GH-34979](https://github.com/apache/arrow/issues/34979), 
[GH-34868](https://github.com/apache/arrow/issues/34868), 
[GH-31868](https://github.com/apache/arrow/issues/31868))
   * Some functionality for `FixedShapeTensorType` has been improved 
(`__reduce__` [GH-36038](https://github.com/apache/arrow/issues/36038), 
picklability [GH-35599](https://github.com/apache/arrow/issues/35599))
   * Pyarrow scalars can now be accepted in the `array` constructor 
[GH-21761](https://github.com/apache/arrow/issues/21761)
   * DataFrame Interchange Protocol implementation and usage is now documented 
[GH-33980](https://github.com/apache/arrow/issues/33980)
   * Conversion between Arrow and Pandas for map/pydict now has enhanced 
support [GH-34729](https://github.com/apache/arrow/issues/34729)
   * Usability of `pc.map_lookup` / `MapLookupOptions` is improved 
[GH-36045](https://github.com/apache/arrow/issues/36045)
   * `zero_copy_only` keyword can now also be accepted in 
`ChunkedArray.to_numpy()` 
[GH-34787](https://github.com/apache/arrow/issues/34787)
   * Python C++ codebase now has linter support in Archery and the CI 
[GH-35485](https://github.com/apache/arrow/issues/35485)
   
   Relevant bug fixes:
   
   * `__array__` numpy conversion for Table and RecordBatch is now corrected so 
that `np.asarray(pa.Table)` doesn’t return a transposed result 
[GH-34886](https://github.com/apache/arrow/issues/34886)
   * `parquet.write_to_dataset` doesn't create empty files for non-observed 
dictionary (category) values anymore 
[GH-23870](https://github.com/apache/arrow/issues/23870)
   * Dataset writer now also correctly follows default Parquet version of 2.6 
[GH-36537](https://github.com/apache/arrow/issues/36537)
   * Comparing `pyarrow.dataset.Partitioning` with other type is now correctly 
handled [GH-36659](https://github.com/apache/arrow/issues/36659)
   * Pickling of pyarrow.dataset PartitioningFactory objects is now supported 
[GH-34884](https://github.com/apache/arrow/issues/34884)
   * None schema is now disallowed in parquet writer 
[GH-35858](https://github.com/apache/arrow/issues/35858)
   * `pa.FixedShapeTensorArray.to_numpy_ndarray` is not failing on sliced 
arrays [GH-35573](https://github.com/apache/arrow/issues/35573)
   * Halffloat type is now supported in the conversion from Arrow list to 
pandas [GH-36168](https://github.com/apache/arrow/issues/36168)
   * `__from_arrow__` is now also implemented for `Array.to_pandas` for pandas 
extension data types [GH-36096](https://github.com/apache/arrow/issues/36096)
   ```



##########
_posts/2023-07-17-13.0.0-release.md:
##########
@@ -0,0 +1,215 @@
+---
+layout: post
+title: "Apache Arrow 13.0.0 Release"
+date: "2023-07-17 00:00:00"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+The Apache Arrow team is pleased to announce the 13.0.0 release. This covers
+over 3 months of development work and includes [**XXX resolved issues**][1]
+from [**YYY distinct contributors**][2]. See the [Install 
Page](https://arrow.apache.org/install/)
+to learn how to get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Community
+
+Since the 12.0.0 release, Marco Neumann, Gang Wu, Mehmet Ozan Kabak and Kevin 
Gurney
+have been invited to be committers.
+Matt Topol, Jie Wen, Ben Baumgold and Dewey Dunnington have joined the
+Project Management Committee (PMC).
+
+Thanks for your contributions and participation in the project!
+
+## Columnar Format Notes
+
+### C Device Data Interface
+
+An **experimental** new specification, the
+[C Device Data 
Interface](https://arrow.apache.org/docs/dev/format/CDeviceDataInterface.html),
+has been accepted for inclusion (GH-34971). It builds on the existing
+C Data Interface to provide a runtime-agnostic zero-copy sharing mechanism
+for Arrow data residing on non-CPU devices.
+
+Reference implementations of the C Device Data Interface will progressively
+be added to the standard Arrow libraries after the 13.0.0 release.
+
+## Arrow Flight RPC notes
+
+## C++ notes
+
+### Building
+
+CMake 3.16 or later is now required for building Arrow C++ (GH-34921).
+
+Optimizations are not disabled anymore when the `RelWithDebInfo` build type
+is selected (GH-35850). Furthermore, compiler flags can now properly be
+customized per-build type using `ARROW_C_FLAGS_DEBUG`, `ARROW_CXX_FLAGS_DEBUG`
+and related variables (GH-35870).
+
+### Acero
+
+Handling of unaligned buffers is input nodes can be configured programmatically
+or by setting the environment variable `ACERO_ALIGNMENT_HANDLING`. The default
+behavior is to warn when an unaligned buffer is detected (GH-35498).
+
+### Compute
+
+Several new functions have been added:
+* aggregate functions "first", "last", "first_last" (GH-34911);
+* vector functions "cumulative_prod", "cumulative_min", "cumulative_max" 
(GH-32190);
+* vector function "pairwise_diff" (GH-35786).
+
+Sorting now works on dictionary arrays, with a much better performance than
+the naive approach of sorting the decoded dictionary (GH-29887). Sorting also
+works on struct arrays, and nested sort keys are supported using `FieldRed` 
(GH-33206).
+
+The `check_overflow` option has been removed from `CumulativeSumOptions` as
+it was redundant with the availability of two different functions:
+"cumulative_sum" and "cumulative_sum_checked" (GH-35789).
+
+Run-end encoded filters are efficiently supported (GH-35749).
+
+Duration types are supported with the "is_in" and "index_in" functions 
(GH-36047).
+They can be multiplied with all integer types (GH-36128).
+
+"is_in" and "index_in" now cast their inputs more flexibly: they first attempt
+to cast the value set to the input type, then in the other direction if the
+former fails (GH-36203).
+
+Multiple bugs have been fixed in "utf8_slice_codeunits" when the `stop` option
+is omitted (GH-36311).
+
+### Dataset
+
+A custom schema can now be passed when writing a dataset (GH-35730). The custom
+schema can alter nullability or metadata information, but is not allowed to
+change the datatypes written.
+
+### Filesystems
+
+The S3 filesystem now writes files in equal-sized chunks, for compatibility 
with
+Cloudflare's "R2" Storage (GH-34363).
+
+A long-standing issue where S3 support could crash at shutdown because of 
resources
+still being alive after S3 finalization has been fixed (GH-36346). Now, 
attempts
+to use S3 resources (such as making filesystem calls) after S3 finalization 
should
+result in a clean error.
+
+The GCS filesystem accepts a new option to set the project id (GH-36227).
+
+### IPC
+
+Nullability and metadata information for sub-fields of map types is now 
preserved
+when deserializing Arrow IPC (GH-35297).
+
+### Orc
+
+The Orc adapter now maps Arrow field metadata to Orc type attributes when 
writing,
+and vice-versa when reading (GH-35304).
+
+### Parquet
+
+It is now possible to write additional metadata while a `ParquetFileWriter` is
+open (GH-34888).
+
+Writing a page index can be enabled selectively per-column (GH-34949).
+In addition, page header statistics are not written anymore if the page
+index is enabled for the given column (GH-34375), as the information would
+be redundant and less efficiently accessed.
+
+Parquet writer properties allow specifying the sorting columns (GH-35331).
+The user is responsible for ensuring that the data written to the file
+actually complies with the given sorting.
+
+CRC computation has been implemented for v2 data pages (GH-35171).
+It was already implemented for v1 data pages.
+
+Writing compliant nested types is now enabled by default (GH-29781). This
+should not have any negative implication.
+
+Attempting to load a subset of an Arrow extension type is now forbidden
+(GH-20385). Previously, if an extension type's storage is nested (for example
+a "Point" extension type backed by a `struct<x: float64, y: float64>`),
+it was possible to load selectively some of the columns of the storage type.
+
+### Substrait
+
+Support for various functions has been added: "stddev", "variance", "first",
+"last" (GH-35247, GH-35506).
+
+Deserializing sorts is now supported (GH-32763). However, some features,
+such as clustered sort direction or custom sort functions, are not
+implemented.
+
+### Miscellaneous
+
+`FieldRef` sports additional methods to get a flattened version of nested
+fields (GH-14946). Compared to their non-flattened counterparts,
+the methods `GetFlattened`, `GetAllFlattened`, `GetOneFlattened` and
+`GetOneOrNoneFlattened` combine a child's null bitmap with its ancestors'
+null bitmaps such as to compute the field's overall logical validity bitmap.
+
+In other words, given the struct array `[null, {'x': null}, {'x': 5}]`,
+`FieldRef("x")::Get` might return `[0, null, 5]`
+while `FieldRef("y")::GetFlattened` will *always* return `[null, null, 5]`.
+
+`Scalar::hash()` has been fixed for sliced nested arrays (GH-35360).
+
+A new floating-point to decimal conversion algorithm exhibits much better
+precision (GH-35576).
+
+It is now possible to cast between scalars of different list-like types
+(GH-36309).
+
+## C# notes
+
+## Go notes
+
+## Java notes
+
+## JavaScript notes
+
+## Python notes

Review Comment:
   ```suggestion
   ## Python notes
   
   Compatibility notes:
   
   * The default format version for Parquet has been bumped from 2.4 to 2.6 
[GH-35746](https://github.com/apache/arrow/issues/35746)
   * Support for Python 3.7 is dropped 
[GH-34788](https://github.com/apache/arrow/issues/34788)
   
   New features:
   
   * Conversion to non-nano datetime64 for pandas >= 2.0 is now supported 
[GH-33321](https://github.com/apache/arrow/issues/33321)
   * Write page index is now supported 
[GH-36284](https://github.com/apache/arrow/issues/36284)
   * Bindings for reading JSON format in Dataset are added 
[GH-34216](https://github.com/apache/arrow/issues/34216)
   * `keys_sorted` property of MapType is now exposed 
[GH-35112](https://github.com/apache/arrow/issues/35112)
   
   Other improvements:
   
   * Common python functionality between `Table` and `RecordBatch` classes has 
been consolidated ( [GH-36129](https://github.com/apache/arrow/issues/36129), 
[GH-35415](https://github.com/apache/arrow/issues/35415), 
[GH-35390](https://github.com/apache/arrow/issues/35390), 
[GH-34979](https://github.com/apache/arrow/issues/34979), 
[GH-34868](https://github.com/apache/arrow/issues/34868), 
[GH-31868](https://github.com/apache/arrow/issues/31868))
   * Some functionality for `FixedShapeTensorType` has been improved 
(`__reduce__` [GH-36038](https://github.com/apache/arrow/issues/36038), 
picklability [GH-35599](https://github.com/apache/arrow/issues/35599))
   * Pyarrow scalars can now be accepted in the `array` constructor 
[GH-21761](https://github.com/apache/arrow/issues/21761)
   * DataFrame Interchange Protocol implementation and usage is now documented 
[GH-33980](https://github.com/apache/arrow/issues/33980)
   * Conversion between Arrow and Pandas for map/pydict now has enhanced 
support [GH-34729](https://github.com/apache/arrow/issues/34729)
   * Usability of `pc.map_lookup` / `MapLookupOptions` is improved 
[GH-36045](https://github.com/apache/arrow/issues/36045)
   * `zero_copy_only` keyword can now also be accepted in 
`ChunkedArray.to_numpy()` 
[GH-34787](https://github.com/apache/arrow/issues/34787)
   * Python C++ codebase now has linter support in Archery and the CI 
[GH-35485](https://github.com/apache/arrow/issues/35485)
   
   Relevant bug fixes:
   
   * `__array__` numpy conversion for Table and RecordBatch is now corrected so 
that `np.asarray(pa.Table)` doesn’t return a transposed result 
[GH-34886](https://github.com/apache/arrow/issues/34886)
   * `parquet.write_to_dataset` doesn't create empty files for non-observed 
dictionary (category) values anymore 
[GH-23870](https://github.com/apache/arrow/issues/23870)
   * Dataset writer now also correctly follows default Parquet version of 2.6 
[GH-36537](https://github.com/apache/arrow/issues/36537)
   * Comparing `pyarrow.dataset.Partitioning` with other type is now correctly 
handled [GH-36659](https://github.com/apache/arrow/issues/36659)
   * Pickling of pyarrow.dataset PartitioningFactory objects is now supported 
[GH-34884](https://github.com/apache/arrow/issues/34884)
   * None schema is now disallowed in parquet writer 
[GH-35858](https://github.com/apache/arrow/issues/35858)
   * `pa.FixedShapeTensorArray.to_numpy_ndarray` is not failing on sliced 
arrays [GH-35573](https://github.com/apache/arrow/issues/35573)
   * Halffloat type is now supported in the conversion from Arrow list to 
pandas [GH-36168](https://github.com/apache/arrow/issues/36168)
   * `__from_arrow__` is now also implemented for `Array.to_pandas` for pandas 
extension data types [GH-36096](https://github.com/apache/arrow/issues/36096)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-site] AlenkaF commented on a diff in pull request #382: [Website] Version 13.0.0 blog post

Reply via email to