nealrichardson commented on a change in pull request #127:
URL: https://github.com/apache/arrow-site/pull/127#discussion_r678584346



##########
File path: _posts/2021-07-20-5.0.0-release.md
##########
@@ -0,0 +1,259 @@
+---
+layout: post
+title: "Apache Arrow 5.0.0 Release"
+date: "2020-07-16 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!--
+
+To use this template:
+
+* Update all "XX" values with the appropriate numbers (you can get the 
resolved issues and contributors count from `_release/5.0.0.md`)
+* Fill in the various sections below. Note that the audience is the broader 
user community, not Arrow developers, so please write clearly using terms they 
will understand and care about. Delete any sections that don't have any content 
(as in, there are no changes to announce)
+* Delete this introductory comment
+
+ -->
+
+
+The Apache Arrow team is pleased to announce the 5.0.0 release. This covers
+3 months of development work and includes **XX commits** from
+[**XX distinct contributors**][1] in 2 repositories. See the Install Page to
+learn how to get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the complete changelogs for the [`apache/arrow`][2] and
+[`apache/arrow-rs`][3] repositories.
+
+## Community
+
+Since the 4.0.0 release, Daniël Heres, Kazuaki Ishizaki, Dominik Moritz, and 
Weston Pace
+have been invited as committers to Arrow,
+and Benjamin Kietzman and David Li have joined the Project Management Committee
+(PMC). Thank you for all of your contributions!
+
+## Columnar Format Notes
+
+Official IANA Media types (MIME types) have been registered for Apache
+Arrow IPC protocol data, both [stream]({{ site.baseurl 
}}/docs/format/Columnar.html#ipc-streaming-format)
+and [file]({{ site.baseurl }}/docs/format/Columnar.html#ipc-file-format) 
variants:
+
+* 
https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.stream
+* 
https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.file
+
+We recommend `.arrow` as the IPC file format file extension and `.arrows` for
+the IPC streaming format file extension.
+
+## Arrow Flight RPC notes
+
+The Go implementation now supports custom metadata and middleware, and has
+been added to integration testing.
+
+In Python, some operations can now be interrupted via Control-C.
+
+## C++ notes
+
+`MakeArrayFromScalar` now works for fixed-size binary types (ARROW-13321).
+
+### Compute layer
+
+The following [compute functions]({{site.baseurl}}/docs/cpp/compute.html)
+were added:
+
+* aggregations: `index`
+
+* scalar arithmetic and math functions: `abs`, `abs_checked`, `acos`,
+  `acos_checked`, `asin`, `asin_checked`, `atan`, `atan2`, `ceil`, `cos`,
+  `cos_checked`, `floor`, `ln`, `ln_checked`, `log10`, `log10_checked`,
+  `log1p`, `log1p_checked`, `log2`, `log2_checked`, `negate`, `negate_checked`,
+  `sign`, `sin`, `sin_checked`, `tan`, `tan_checked`, `trunc`
+
+* scalar bitwise functions: `bit_wise_and`, `bit_wise_not`, `bit_wise_or`,
+  `bit_wise_xor`, `shift_left`, `shift_left_checked`, `shift_right`,
+  `shift_right_checked`
+
+* scalar string functions: `ascii_center`, `ascii_lpad`, `ascii_reverse`,
+  `ascii_rpad`, `binary_join`, `binary_join_element_wise`,
+  `binary_replace_slice`, `count_substring`, `count_substring_regex`,
+  `ends_with`, `find_substring`, `find_substring_regex`, `match_like`,
+  `split_pattern_regex`, `starts_with`, `utf8_center`, `utf8_lpad`,
+  `utf8_replace_slice`, `utf8_rpad`, `utf8_reverse`, `utf8_slice_codepoints`
+
+* scalar temporal functions: `day`, `day_of_week`, `day_of_year`,
+  `iso_calendar`, `iso_week`, `iso_year`, `hour`, `microsecond`, `millisecond`,
+  `minute`, `month`, `nanosecond`, `quarter`, `second`, `subsecond`, `year`
+
+* other scalar functions: `case_when`, `coalesce`, `if_else`, `is_finite`,
+  `is_inf`, `is_nan`, `max_element_wise`, `min_element_wise`, `make_struct`
+
+* vector functions: `replace_with_mask`
+
+Duplicates are now allowed in `SetLookupOptions::value_set` (ARROW-12554).
+
+Decimal types are now supported by some basic arithmetic functions 
(ARROW-12074).
+
+The `take` function now supports dense unions (ARROW-13005).
+
+It is now possible to cast between dictionary types with different index
+types (ARROW-11673).
+
+Sorting is now implemented for boolean input (ARROW-12016).
+
+### CSV
+
+The streaming CSV reader can now take some advantage of multiple threads 
(ARROW-11889).
+
+The CSV reader tries to make its errors more informative by adding the
+row number when it is known, i.e. when parallel reading is disabled 
(ARROW-12675).
+
+A new option `ReaderOptions::skip_rows_after_names` allows skipping a number
+of rows _after_ reading the column names (as opposed to
+`ReaderOptions::skip_rows`).
+
+Quoted strings can now be treated as always non-null (ARROW-10115).
+
+### Dataset layer
+
+### IO and Filesystem layer
+
+The I/O thread pool size can now be adjuted at runtime (ARROW-12760).
+The default size remains 8 threads.
+
+Streams now can have auxiliary metadata, depending on the backend.  This
+has been implemented for the S3 filesystems, where a couple metadata
+keys are supported such as `Content-Type` and `ACL` (ARROW-11161, ARROW-12719).
+
+The HadoopFileSystem implementation now implements the FileSystem abstraction
+more faithfully (ARROW-12790).
+
+### Parquet
+
+The new LZ4_RAW compression scheme was implemented (PARQUET-1998).
+Unlike the legacy LZ4 compression scheme, it is defined unambiguously
+and should provide better portability once other Parquet implementations
+catch up.
+
+## Go notes
+
+* Flight Client and Server now support Custom Metadata through the functions 
`flight.NewClientWithMiddleware` and `flight.NewServerWithMiddleware`. Functions
+`flight.NewFlightClient`, `flight.NewFlightServer`, 
`flight.CreateServerBearerTokenAuthInterceptors` have been deprecated in favor 
of using the new middleware. 
[#10633](https://github.com/apache/arrow/pull/10633)
+* Fixed build failure for Parquet implementation on s390x 
[#10475](https://github.com/apache/arrow/pull/10475)
+* Flight Client `AuthHandler` no longer overwrites outgoing metadata, 
correctly appending new metadata without overwriting existing metadata 
[#10297](https://github.com/apache/arrow/pull/10297)
+* Flight AppMetadata field is now exposed both for Reading and Writing via 
`flight.Reader#LatestAppMetadata()` and `flight.Writer#WriteWithAppMetadata` 
functions [#10142](https://github.com/apache/arrow/pull/10142)
+* Extension Datatype is now implemented for Arrow Arrays 
[#10203](https://github.com/apache/arrow/pull/10203)
+* Map Datatype is now implemented for Arrow Arrays 
[#10106](https://github.com/apache/arrow/pull/10106)
+* Decimal integration tests with Go as both producer or consumer have been 
fixed [#10116](https://github.com/apache/arrow/10116)
+* Schema Package added for Golang Parquet Implementation 
[#10071](https://github.com/apache/arrow/10071)
+* First part of Encoding package for Golang Parquet Implementation added 
[#10379](https://github.com/apache/arrow/10379)
+
+## Java notes
+
+Highlighted improvements and fixes:
+
+* Improved support for extension types using a complex storage type, e.g. 
struct, map or union. These can now extend
+the `ExtensionTypeVector` base class.
+* Union vectors now extend `AbstractContainerVector` to be consistent with 
other vectors.
+* Guava dependency updated to 30.1.1
+* Memory leak fixed if an exception occurs when reading IPC messages from a 
channel. [#10423](https://github.com/apache/arrow/pull/10423)
+* Flight error metadata is now propagated to the client. 
[#10370](https://github.com/apache/arrow/pull/10370)
+* JDBC adapter now preserves nullability. 
[#10285](https://github.com/apache/arrow/pull/10285)
+
+API compatibility changes:
+
+* Complex vectors now return covariant types from `getObject(int)`. 
[#9964](https://github.com/apache/arrow/pull/9964)
+
+## JavaScript notes
+
+* Tables do not extend DataFrames anymore. This enables smaller bundles. 
[#10277](https://github.com/apache/arrow/pull/10277)
+* Arrow uses closure compiler for all UMD bundles, making them smaller. 
[#10281](https://github.com/apache/arrow/pull/10281)
+* The npm package now comes with declaration maps for better navigation from 
types to source code. [#10673](https://github.com/apache/arrow/pull/10673)
+* Updated dependencies and improvements to the code.
+
+## Python notes
+
+* Datasets can now scan files asynchronously when the `use_async=True` option 
is provided to `Dataset.scanner`, `Dataset.to_table`, or `Dataset.to_batches` 
methods. This should provide better performance in environments where I/O can 
be slow, such as with remote sources.
+* Arrow now provides builtin support for writing CSV files through 
`pyarrow.csv.write_csv`
+* Wheels for Apple M1 Macs are now provided.
+* Many new `pyarrow.compute` functions are available, and introspection of the 
functions was improved so that they look more like standard Python functions.
+* It is now possible to access ORC file metadata from Python `ORCFile` objects
+* Building a `StructArray` now accepts a `mask` like other arrays
+* Many updates and fixes for the documentation
+
+## R notes
+
+In this release, we've more than doubled the number of functions you can call 
on Arrow Datasets inside `dplyr::filter()` and `mutate()`, including many more 
string, datetime, and math functions. You can also write Datasets to CSV files, 
in addition to Parquet and Feather. We've also deepened support for the Arrow C 
interface, which is used in the Python interface and allows integration with 
other projects, such as DuckDB.
+
+For more on what’s in the 5.0.0 R package, see the [R changelog][4].
+
+## Ruby and C GLib notes
+
+Apache Arrow Flight support is started. But `ListFlights` is only supported 
for now. More features will be implemented in the next major release.
+
+### Ruby
+
+You need gobject-introspection gem 3.4.5 or later to implement your Apache 
Arrow Flight server. If you only use Apache Arrow Flight client, 
gobject-introspection gem 3.4.5 or later isn't required.
+
+Here are highlighted improvements:
+
+* Compute functions accept raw Ruby objects such as `true`, `Integer`, `Array` 
and `String`:
+
+  ```ruby
+  add_function = Arrow::Function.find("add")
+  # Not shortcut version
+  augend = Arrow::Int8Array.new([1, 2, 3])
+  addend = Arrow::Int8Scalar.new(5)
+  args = [
+    Arrow::ArrayDatum.new(augend),
+    Arrow::ScalarDatum.new(addend),
+  ]
+  add_function.execute(args).value.to_a # => [6, 7, 8]
+  # Shortcut version
+  add_function.execute([[1, 2, 3], 5]).value.to_a # => [6, 7, 8]
+  ```
+* `Arrow::PrimaryArray` and `Arrow::Buffer` can be used as MemoryView that is 
added in Ruby 3.0.
+
+There are some backward incompatible changes:
+
+* `Arrow::CountOptions` and `Arrow::CountMode` are removed. Use 
`Arrow::ScalarAggregateOptions` instead.
+
+### C GLib
+
+There are some backward incompatible changes:
+
+* `GArrowCountOptions` and `GArrowCountMode` are removed. Use 
`GArrowScalarAggregateOptions` instead.
+* `garrow_array_equal_range()` requires `GArrowEqualOptions`.
+* Prefix in arrow-dataset-glib is changed to `gadataset_`/`GADATASET_` from 
`gad_`/`GAD_`.
+* `GADScanOptions`, `GADScanTask`  and `GADInMemoryScanTask` are removed. Use 
`gadataset_begin_scan()` or `gadataset_to_table()` instead.
+* `GArrowCompareOptions`, `GArrowCompareOperator` and 
`garrow_*_array_compare()` are removed. Use `equal`, `not_equal`, `less_than`, 
`less_than_equal`, `greater_than` and `greater_than_equal` compute functions 
directly instead.
+
+## Rust notes
+
+The Rust projects have moved to separate repositories outside the
+main Arrow monorepo. For notes on the 5.0.0 release of the Rust
+implementation, see the [Arrow Rust changelog][3] and the
+"Apache Arrow Rust 5.0.0 Release" blog post.

Review comment:
       TODO: link to that, when published




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to