This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-site.git
The following commit(s) were added to refs/heads/master by this push:
new f45dd48 ARROW-8548: [Website] 0.17 release post
f45dd48 is described below
commit f45dd485ccb491ea2681d369a0258f44cc560ac6
Author: Neal Richardson <[email protected]>
AuthorDate: Wed Apr 22 18:53:35 2020 -0500
ARROW-8548: [Website] 0.17 release post
Closes #55 from nealrichardson/0.17-blog and squashes the following commits:
97d2060c <Wes McKinney> Minor tweaks to 0.17.0 blog post
9a80f57f <Neal Richardson> Resolve remaining todos
382cba1b <Krisztián Szűcs> nightly wheels and conda packages
6d1ad38d <François Saint-Jacques> Update _posts/2020-04-21-0.17.0-release.md
92d426b7 <Neal Richardson> Add draft 0.17 blog post
Lead-authored-by: Neal Richardson <[email protected]>
Co-authored-by: Krisztián Szűcs <[email protected]>
Co-authored-by: Wes McKinney <[email protected]>
Co-authored-by: François Saint-Jacques <[email protected]>
Signed-off-by: Wes McKinney <[email protected]>
---
_posts/2020-04-21-0.17.0-release.md | 254 ++++++++++++++++++++++++++++++++++++
1 file changed, 254 insertions(+)
diff --git a/_posts/2020-04-21-0.17.0-release.md
b/_posts/2020-04-21-0.17.0-release.md
new file mode 100644
index 0000000..811320b
--- /dev/null
+++ b/_posts/2020-04-21-0.17.0-release.md
@@ -0,0 +1,254 @@
+---
+layout: post
+title: "Apache Arrow 0.17.0 Release"
+date: "2020-04-21 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+The Apache Arrow team is pleased to announce the 0.17.0 release. This covers
+over 2 months of development work and includes [**569 resolved issues**][1]
+from [**79 distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Community
+
+Since the 0.16.0 release, two committers have joined the Project Management
+Committee (PMC):
+
+* [Neal Richardson][4]
+* [François Saint-Jacques][5]
+
+Thank you for all your contributions!
+
+## Columnar Format Notes
+
+A [C-level Data Interface][6] was designed to ease data sharing inside a single
+process. It allows different runtimes or libraries to share Arrow data using a
+well-known binary layout and metadata representation, without any copies. Third
+party libraries can use the C interface to import and export the Arrow columnar
+format in-process without requiring on any new code dependencies.
+
+The C++ library now includes an implementation of the C Data Interface, and
+Python and R have bindings to that implementation.
+
+## Arrow Flight RPC notes
+
+* Adopted new DoExchange bi-directional data RPC
+* ListFlights supports being passed a Criteria argument in
+ Java/C++/Python. This allows applications to search for flights satisfying a
+ given query.
+* Custom metadata can be attached to errors that the server sends to the
+ client, which can be used to encode richer application-specific information.
+* A number of minor bugs were fixed, including proper handling of empty null
+ arrays in Java and round-tripping of certain Arrow status codes in
+ C++/Python.
+
+## C++ notes
+
+### Feather V2
+
+The "Feather V2" format based on the Arrow IPC file format was developed.
+Feather V2 features full support for all Arrow data types, and resolves the 2GB
+per-column limitation for large amounts of string data that the [original
+Feather implementation][13] had. Feather V2 also introduces experimental IPC
+message compression using LZ4 frame format or ZSTD. This will be formalized
+later in the Arrow format.
+
+### C++ Datasets
+
+* Improve speed on high latency file system by relaxing discovery validation
+* Better performance with Arrow IPC files using column projection
+* Add the ability to list files in FileSystemDataset
+* Add support for Parquet file reader options
+* Support dictionary columns in partition expression
+* Fix various crashes and other issues
+
+### C++ Parquet notes
+
+* Complete support for writing nested types to Parquet format was
+ completed. The legacy code can be accessed through parquet write option C++
+ and an environment variable in Python. Read support will come in a future
+ release.
+* The BYTE_STREAM_SPLIT encoding was implemented for floating-point types. It
+ helps improve the efficiency of memory compression for high-entropy data.
+* Expose Parquet schema field_id as Arrow field metadata
+* Support for DataPageV2 data page format
+
+### C++ build notes
+
+* We continued to make the core C++ library build simpler and faster. Among the
+ improvements are the removal of the dependency on Thrift IDL compiler at
+ build time; while Parquet still requires the Thrift runtime C++ library, its
+ dependencies are much lighter. We also further reduced the number of build
+ configurations that require Boost, and when Boost is needed to be built, we
+ only download the components we need, reducing the size of the Boost bundle
+ by 90%.
+* Improved support for building on ARM platforms
+* Upgraded LLVM version from 7 to 8
+* Simplified SIMD build configuration with ARROW_SIMD_LEVEL option allowing no
+ SIMD, SSE4.2, AVX2, or AVX512 to be selected.
+* Fixed a number of bugs affecting compilation on aarch64 platforms
+
+### Other C++ notes
+
+* Many crashes on invalid input detected by [OSS-Fuzz][7] in the IPC reader and
+ in Parquet-Arrow reading were fixed. See our recent [blog post][8] for more
+ details.
+* A “Device” abstraction was added to simplify buffer management and movement
+ across heterogeneous hardware configurations, e.g. CPUs and GPUs.
+* A streaming CSV reader was implemented, yielding individual RecordBatches and
+ helping limit overall memory occupation.
+* Array casting from Decimal128 to integer types and to Decimal128 with
+ different scale/precision was added.
+* Sparse CSF tensors are now supported.
+* When creating an Array, the null bitmap is not kept if the null count is
known to be zero
+* Compressor support for the LZ4 frame format (LZ4_FRAME) was added
+* An event-driven interface for reading IPC streams was added.
+* Further core APIs that required passing an explicit out-parameter were
+ migrated to `Result<T>`.
+* New analytics kernels for match, sort indices / argsort, top-k
+
+## Java notes
+
+* Netty dependencies were removed for BufferAllocator and ReferenceManager
+ classes. In the future, we plan to move netty related classes to a separate
+ module.
+* New features were provided to support efficiently appending vector/vector
+ schema root values in batch.
+* Comparing a range of values in dense union vectors has been supported.
+* The quick sort algorithm was improved to avoid degenerating to the worst
case.
+
+## Python notes
+
+### Datasets
+
+* Updated `pyarrow.dataset` module following the changes in the C++ Datasets
+ project. This release also adds [richer documentation][10] on the datasets
+ module.
+* Support for the improved dataset functionality in
+`pyarrow.parquet.read_table/ParquetDataset`. To enable, pass
+`use_legacy_dataset=False`. Among other things, this allows to specify filters
+for all columns and not only the partition keys (using row group statistics)
+and enables different partitioning schemes. See the "note" in the
+[`ParquetDataset` documentation][11].
+
+### Packaging
+
+* Wheels for Python 3.8 are now available
+* Support for Python 2.7 has been dropped as Python 2.x reached end-of-life in
+ January 2020.
+* Nightly wheels and conda packages are now available for testing or other
+ development purposes. See the [installation guide][16]
+
+### Other improvements
+
+* Conversion to numpy/pandas for FixedSizeList, LargeString, LargeBinary
+* Sparse CSC matrices and Sparse CSF tensors support was added. (ARROW-7419,
+ ARROW-7427)
+
+## R notes
+
+Highlights include support for the Feather V2 format and the C Data Interface,
+both described above. Along with low-level bindings for the C interface, this
+release adds tooling to work with Arrow data in Python using `reticulate`. See
+[`vignette("python", package = "arrow")`][14] for a guide to getting started.
+
+Installation on Linux now builds C++ the library from source by default. For a
+faster, richer build, set the environment variable `NOT_CRAN=true`. See
+[`vignette("install", package = "arrow")`][15] for details and more options.
+
+For more on what’s in the 0.17 R package, see the [R changelog][9].
+
+## Ruby and C GLib notes
+
+### Ruby
+
+* Support Ruby 2.3 again
+
+### C GLib
+
+* Add GArrowRecordBatchIterator
+* Add support for GArrowFilterOptions
+* Add support for Peek() to GIOInputStream
+* Add some metadata bindings to GArrowSchema
+* Add LocalFileSystem support
+* Add support for writer properties of Parquet
+* Add support for MapArray
+* Add support for BooleanNode
+
+## Rust notes
+
+* DictionayArray support.
+* Various improvements to code safety.
+* Filter kernel now supports temporal types.
+
+### Rust Parquet notes
+
+* Array reader now supports temporal types.
+* Parquet writer now supports custom meta-data key/value pairs.
+
+### Rust DataFusion notes
+
+* Logical plans can now reference columns by name (as well as by index) using
+ the new `UnresolvedColumn` expression. There is a new optimizer rule to
+ resolve these into column indices.
+* Scalar UDFs can now be registered with the execution context and used from
+ logical query plans as well as from SQL. A number of math scalar functions
+ have been implemented using this feature (sqrt, cos, sin, tan, asin, acos,
+ atan, floor, ceil, round, trunc, abs, signum, exp, log, log2, log10).
+* Various SQL improvements, including support for `SELECT *` and `SELECT
+ COUNT(*)`, and improvements to parsing of aggregate queries.
+* Flight examples are provided, with a client that sends a SQL statement to a
+ Flight server and receives the results.
+* The interactive SQL command-line tool now has improved documentation and
+ better formatting of query results.
+
+## Project Operations
+
+We’ve continued our migration of general automation toward GitHub Actions. The
+majority of our commit-by-commit continuous integration (CI) is now running on
+GitHub Actions. We are working on different solutions for using dedicated
+hardware as part of our CI. The [Buildkite][12] self-hosted CI/CD platform is
+now supported on Apache repositories and GitHub Actions also supports
+self-hosted workers.
+
+[1]:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%200.17.0
+[2]: https://arrow.apache.org/release/0.17.0.html#contributors
+[3]: https://arrow.apache.org/release/0.17.0.html
+[4]: https://github.com/nealrichardson
+[5]: https://github.com/fsaintjacques
+[6]: https://arrow.apache.org/docs/format/CDataInterface.html
+[7]: https://google.github.io/oss-fuzz/
+[8]: https://arrow.apache.org/blog/2020/03/31/fuzzing-arrow-ipc/
+[9]: https://arrow.apache.org/docs/r/news/
+[10]: https://arrow.apache.org/docs/python/dataset.html
+[11]:
https://arrow.apache.org/docs/python/parquet.html#reading-from-partitioned-datasets
+[12]: https://buildkite.com/
+[13]: https://github.com/wesm/feather
+[14]: https://arrow.apache.org/docs/r/articles/python.html
+[15]: https://arrow.apache.org/docs/r/articles/install.html
+[16]:
https://arrow.apache.org/docs/python/install.html#installing-nightly-packages