[GitHub] [arrow-site] amol- commented on a change in pull request #178: Version 7.0.0 release blog post

GitBox Mon, 07 Feb 2022 07:09:01 -0800


amol- commented on a change in pull request #178:
URL: https://github.com/apache/arrow-site/pull/178#discussion_r800752242




##########
File path: _posts/2022-01-19-7.0.0-release.md
##########
@@ -0,0 +1,288 @@
+---
+layout: post
+title: "Apache Arrow 7.0.0 Release"
+date: "2022-02-09 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+The Apache Arrow team is pleased to announce the 7.0.0 release. This covers
+over 3 months of development work and includes [**474 resolved issues**][1]
+from [**?? distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Community
+
+Since the 6.0.1 release, Rémi Dattai and Alessandro Molina have been invited 
to be committers.
+Daniël Heres and Yibo Cai have joined the Project Management Committee (PMC).
+Thanks for your contributions and participation in the project!
+
+## Columnar Format Notes
+
+TBD
+
+## Arrow Flight RPC notes
+
+The Flight specification has been clarified to note that schemas are expected 
to be IPC-encapsulated on the wire.
+
+Documentation has been generally improved; see the [Arrow 
Cookbook](https://arrow.apache.org/cookbook/) for recipes on how to use Flight 
in Python and R, and a new 
[example](https://github.com/apache/arrow/blob/master/cpp/examples/arrow/flight_grpc_example.cc)
 on how to use Flight and gRPC services on the same port.
+
+This release includes Arrow Flight SQL, a protocol for using Arrow Flight to 
execute queries against and fetch metadata from SQL databases. Support is 
included for C++ and Java (but *not* languages that bind to C++, like Python or 
R). A more detailed blog post is forthcoming. Note that development is ongoing 
and the specification is currently experimental.
+
+## C++ notes
+
+A set of CMake presets has been added to ease building Arrow in a number
+of cases (ARROW-14678, ARROW-14714).
+
+The `arrow::BitUtil` namespace has been renamed to `arrow::bit_util`
+(ARROW-13494).
+
+Concatenation of union arrays is now supported (ARROW-4975).
+
+`StructType` gained three convenience methods to add, change and remove
+a given field (ARROW-11424).
+
+The `Datum` kind `COLLECTION` has been removed as it was entirely unused
+in the codebase (ARROW-13598).
+
+### Compute Layer
+
+A number of compute functions have been added:
+
+- functions operating on strings: "binary_reverse" (ARROW-14306),
+  "string_repeat" (ARROW-12712), "utf8_normalize" (ARROW-14205);
+- "fill_null_forward", "fill_null_backward" (ARROW-1699);
+- "ceil_temporal", "floor_temporal", "round_temporal" to adjust temporal input
+  to an integral multiple of a given unit (ARROW-14822);
+- "year_month_day" to extract the calendar components of the input 
(ARROW-15032);
+- "random" to general random floating-point values between 0 and 1 
(ARROW-12404);
+- "indices_nonzero" to return the indices in the input where there are
+  non-zero, non-null values (ARROW-13035).
+
+Decimal data is now supported as input of the arithmetic kernels
+(ARROW-13130).
+
+Dictionary data is now supported as input of the hash join execution node
+(ARROW-14181).
+
+Residual predicates have been implemented in the hash join node
+(ARROW-13643).
+
+The "list_parent_indices" function now always returns int64 data
+regardless of the input type (ARROW-14592).
+
+Month-day-nano interval data is now supported as input of the same functions
+as other interval types (ARROW-13989).
+
+### CSV
+
+The CSV writer got additional configuration options:
+- the string representation of null values (ARROW-14905);
+- the quoting strategy: always / never / as needed (ARROW-14905);
+- the end of line character(s) (ARROW-14907)
+
+### Dataset Layer
+
+[Skyhook]({% link 
_posts/2022-01-31-skyhook-bringing-computation-to-storage-with-apache-arrow.md 
%}),
+a dataset addition that offloads fragment scan operations to a
+Ceph distributed storage cluster, was contributed (ARROW-13607).
+
+The dataset writer now exposes options `min_rows_per_group` and
+`max_rows_per_group` to control the size of row groups created (ARROW-14426).
+
+### IO and Filesystem Layer
+
+A critical bug in the AWS SDK for C++ that risks losing data in S3 multipart
+uploads has been circumvented (ARROW-14523).
+
+The Google Cloud Storage filesystem is now featureful enough to pass all
+generic filesystem tests (ARROW-14924).
+
+The OpenAppendStream method of filesystems has been un-deprecated; however,
+it still cannot be implemented for all filesystem backends (ARROW-14969).
+
+A new function `arrow::fs::ResolveS3BucketRegion` allows resolving the
+region where a particular S3 bucket resides (ARROW-15165).
+
+The S3 filesystem now sets the Content-Type of output files to
+"application/octet-stream" (instead of "application/xml" previously)
+if not explicitly specified by the caller (ARROW-15306).
+
+### IPC
+
+Fine-grained I/O (coalescing) is now enabled in the synchronous (ARROW-12683)
+and asynchronous (ARROW-14577) IPC reader.
+
+It is now possible to set the compression level when using LZ4 compression
+(ARROW-9648).
+### ORC
+
+The ORC adapters have been significantly improved. A lot more properties of 
the ORC reader as well as ORC writer options are now available. Moreover API 
docs for both the ORC reader and the ORC writer have been generated.  
(ARROW-11297)
+### Parquet
+
+DELTA_BYTE_ARRAY-encoded data can now be read from (but not written to)
+bytearray columns in Parquet files (PARQUET-492).
+
+## C# notes
+
+TBD
+
+## Go notes
+
+### Arrow
+
+#### Bug Fixes
+
+* License lifted up a level so that it is properly detected for the 
github.com/apache/arrow/go/v7 module for pkg.go.dev 
[ARROW-14728](https://github.com/apache/arrow/pull/11715). Documentation on 
pkg.go.dev will look correct with complete major version handling as of the 
v7.0.0 release.
+* Errors from `MessageReader.Message` get properly surfaced by `Reader.Read` 
[ARROW-14769](https://github.com/apache/arrow/pull/11739)
+* `ipc.Reader` properly uses the allocator it is initialized with instead of 
making native byte slices 
[ARROW-14717](https://github.com/apache/arrow/pull/11712)
+* Fixed a CI issue where the CGO tests were crashing on windows 
[ARROW-14589](https://github.com/apache/arrow/pull/11611)
+* Various fixes for internal usages of `Release` and `Retain` to maintain 
proper management of reference counting.
+
+#### Enhancements
+
+* Continuous Integration for Go library now uses Go1.16 as the version being 
tested [ARROW-14985](https://github.com/apache/arrow/pull/11860)
+* `ValueOffsets` function added to `array.String` to return the entire slice 
of offsets [ARROW-14645](https://github.com/apache/arrow/pull/11653)
+* `array.Interface` has been lifted to `arrow.Array`, 
`array.{Record,Column,Chunked,Table}` have been lifted to 
`arrow.{Record,Column,Chunked,Table}`. Interface `arrow.ArrayData` has been 
created to be used instead of `array.Data`. Aliases have been provided for the 
`array` package so existing code that doesn't directly use `array.Data` 
shouldn't be affected. The aliases will be removed in v8. 
[ARROW-5599](https://github.com/apache/arrow/pull/11832). The 
`Chunked.NewSlice` method has been removed and is replaced by the 
`array.NewChunkedSlice` function.
+* Arrays and Records now support marshalling to JSON via the `json.Marshaller` 
interface. Builders support adding values to them by unmarshalling from JSON 
via the `json.Unmarshaller` interface. `array.FromJSON` function added to 
create Arrays from JSON directly. 
[ARROW-9630](https://github.com/apache/arrow/pull/11359)
+* Basic handling of field referencing and expression building similar to the 
C++ Compute APIs added through the new `compute` package in preparation for 
adding compute interfaces. Does not yet allow *executing* expressions. 
[ARROW-14430](https://github.com/apache/arrow/pull/11514)
+
+### Parquet
+
+#### Enhancements
+
+* Updated dependency versions 
[ARROW-14462](https://github.com/apache/arrow/pull/11537)
+* `file` module added, Go Parquet library now supports full file reading and 
writing. [ARROW-13984](https://github.com/apache/arrow/pull/11146) 
[ARROW-13986](https://github.com/apache/arrow/pull/11538). Does not yet provide 
direct Parquet <--> Arrow conversions.
+* Internal min_max utility functions given Arm64 NEON SIMD optimized assembly, 
gaining a 4x - 6x performance improvement. 
[ARROW-15536](https://github.com/apache/arrow/pull/12163)
+### Bug Fixes
+
+TBD
+
+### Enhancements
+
+TBD
+
+## Java notes
+
+* Flight SQL support is now available in the Java library, with integration 
tests to verify it against the C++ reference implementation.
+* `GeneralOutOfPlaceVectorSorter` is now available for sorting any kind of 
vector. In general if dedicated sorters can be used (like 
`FixedWidthInPlaceVectorSorter`) they should preferred as they will generally 
perform better.
+* `log4j2` dependency was removed as it was unused and a possible vector for 
attacks 
+* `VectorSchemaRootAppender` now works with `BitVector`
+
+## JavaScript notes
+
+* Major simplifications to the API. There is only a single Vector class now. 
See the (also much improved) docs for details.
+* Dictionary vectors created with `vectorFromArray` are automatically cached 
for better performance.
+* Better tree shaking support. Some bundles can now be only a few kb.
+
+## Python notes
+
+* Official support for Python 3.6 has been dropped.
+* `random` and `indices_nonzero` compute functions are now supported in Python
+* `pyarrow.orc.read_table` is now provided to easily read the content of ORC 
files to a Table.
+* `pyarrow.orc.ORCFile` now has a lot more properties exposed.
+* `pyarrow.orc.ORCWriter` and `pyarrow.orc.write_table` now have the writer 
options available.
+* `pyarrow.orc` now has much better API documentation.
+* Support for compute functions arguments and options has been improved in 
general,  arguments are not position only, while options can be provided as 
keyword args or not, and error reporting for wrong arguments has been improved.
+* `Table` now has a `group_by` method that allows to perform aggregations on 
table data. The compute functions documentation has also been improved to 
better distinguish between standard compute functions and `HASH_AGGREGATE` 
compute functions that can only be using for aggregations.
+* Python documentation now provides interlinking for references to parameter 
types and return values, thus making far easier to navigate the documentation.
+
+## R notes
+
+This release adds additional improvements to the dyplr interface, to CSV 
support and to the C-Data interface to exchange data with other languages. For 
more details, see the [complete R 
changelog](https://arrow.apache.org/docs/r/news/).

Review comment:
       ```suggestion
   This release adds additional improvements to the dyplr interface, to CSV 
support and to the C-Data interface to exchange data with other languages. For 
more details, see the [complete R changelog][4].
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-site] amol- commented on a change in pull request #178: Version 7.0.0 release blog post

Reply via email to