This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-site.git
The following commit(s) were added to refs/heads/master by this push: new 5a88239 ARROW-6616: [Website] Release announcement blog post for 0.15 5a88239 is described below commit 5a88239c01690888da0c677fcca659ae9f71391d Author: Neal Richardson <neal.p.richard...@gmail.com> AuthorDate: Sun Oct 6 16:10:31 2019 -0500 ARROW-6616: [Website] Release announcement blog post for 0.15 Needs filling in. Any committers should feel free to edit and push to my fork with changes. Closes #27 from nealrichardson/blog-post-0.15 and squashes the following commits: 5dedee7e <Wes McKinney> Finish getting 0.15.0 blog post ready for publication dcd0d55c <Yosuke Shiro> Add Ruby and C GLib notes 8bf9d571 <Kenta Murata> Add C++ Tensor notes 761ad039 <Neal Richardson> Add R notes and note about Parquet C++ improvements e4f238df <Andy Grove> Add Rust notes 8ca655e9 <Antoine Pitrou> Add Flight and Python notes 7799c2ee <Antoine Pitrou> Add Flight notes 857d8a8c <Antoine Pitrou> Add some comments 1a558446 <Neal Richardson> Add java notes from @tianchen92 d519f0de <Neal Richardson> Add start of 0.15 post (and fix link in 0.14 post) ec40d08a <Neal Richardson> Update committers list Lead-authored-by: Neal Richardson <neal.p.richard...@gmail.com> Co-authored-by: Antoine Pitrou <anto...@python.org> Co-authored-by: Wes McKinney <wesm+...@apache.org> Co-authored-by: Yosuke Shiro <yosuke.shiro...@gmail.com> Co-authored-by: Kenta Murata <m...@mrkn.jp> Co-authored-by: Andy Grove <andygrov...@gmail.com> Signed-off-by: Wes McKinney <wesm+...@apache.org> --- _posts/2019-07-08-0.14.0-release.md | 2 +- _posts/2019-09-03-faster-strings-cpp-parquet.md | 2 +- _posts/2019-10-07-0.15.0-release.md | 280 ++++++++++++++++++++++++ committers.html | 8 +- 4 files changed, 289 insertions(+), 3 deletions(-) diff --git a/_posts/2019-07-08-0.14.0-release.md b/_posts/2019-07-08-0.14.0-release.md index e5a0f26..2bda136 100644 --- a/_posts/2019-07-08-0.14.0-release.md +++ b/_posts/2019-07-08-0.14.0-release.md @@ -286,7 +286,7 @@ community there: * [Sparse representation and compression in Arrow][12] * [Flight extensions: middleware API and generalized Put operations][14] -[1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%200.13.0 +[1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%200.14.0 [2]: https://arrow.apache.org/release/0.14.0.html#contributors [3]: https://arrow.apache.org/release/0.14.0.html [4]: https://github.com/apache/arrow/tree/master/js/src/builder diff --git a/_posts/2019-09-03-faster-strings-cpp-parquet.md b/_posts/2019-09-03-faster-strings-cpp-parquet.md index 433dc95..fc3f511 100644 --- a/_posts/2019-09-03-faster-strings-cpp-parquet.md +++ b/_posts/2019-09-03-faster-strings-cpp-parquet.md @@ -7,7 +7,7 @@ use) for Arrow columnar binary and string data, with new native support for Arrow's dictionary types. This should have a big impact on users of the C++, MATLAB, Python, R, and Ruby interfaces to Parquet files." date: "2019-09-05 00:00:00 -0600" -author: Wes McKinney +author: wesm categories: [application] --- <!-- diff --git a/_posts/2019-10-07-0.15.0-release.md b/_posts/2019-10-07-0.15.0-release.md new file mode 100644 index 0000000..d88aaf1 --- /dev/null +++ b/_posts/2019-10-07-0.15.0-release.md @@ -0,0 +1,280 @@ +--- +layout: post +title: "Apache Arrow 0.15.0 Release" +date: "2019-10-06 00:00:00 -0600" +author: pmc +categories: [release] +--- +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +The Apache Arrow team is pleased to announce the 0.15.0 release. This covers +about 3 months of development work and includes [**687 resolved issues**][1] +from [**80 distinct contributors**][2]. See the Install Page to learn how to +get the libraries for your platform. The [complete changelog][3] is also +available. + +About a third of issues closed (240) were classified as bug fixes, so this +release brings many stability, memory use, and performance improvements over +0.14.x. We will discuss some of the language-specific improvements and new +features below. + +## New committers + +Since the 0.14.0 release, we've added four new committers: + +* [Ben Kietzman][4] +* [David Li][5] +* [Kenta Murata][6] +* [Neal Richardson][7] + +In addition, [Sebastien Binet][8] and [Micah Kornfield][9] have joined the PMC. + +Thank you for all your contributions! + +## Columnar Format Notes + +The format gets new datatypes : LargeList(ARROW-4810), LargeBinary and +LargeString (ARROW-750). LargeList is similar to List but with 64-bit +offsets instead of 32-bit. The same relationship holds for LargeBinary +and LargeString with respect to Binary and String. + +Since 0.14.0 we have modified the IPC "encapsulated message" format to insert 4 +bytes of additional data in the message preamble to ensure that the Flatbuffers +metadata starts on an aligned offset. By default, IPC streams generated by +0.15.0 and later will not be readable by library versions 0.14.1 and +prior. Implementations have offered options to write messages using the now +"legacy" message format. + +Since the last major release, we have also made a significant overhaul of the +[columnar format documentation][16] to be clearer and easier to follow for +implementation creators. + +## Upcoming Columnar Format Stability and Library / Format Version Split + +The Arrow community has decided to make a 1.0.0 release of the project marking +formal stability of the columnar format and binary protocol, including explicit +forward and backward compatibility guarantees. You can read about these +guarantees in the [new documentation page][17] about versioning. + +Starting with 1.0.0, we will give the columnar format and libraries separate +version numbers. This will allow the library versions to evolve without +creating confusion or uncertainty about whether the Arrow columnar format +remains stable or not. + +## Arrow Flight notes + +A GetFlightSchema method is added to the Flight RPC protocol (ARROW-6094). +As the name suggests, it returns the schema for a given Flight descriptor +on the server. This is useful for cases where the Flight locations are +not immediately available, depending on the server implementation. + +Flight implementations for C++ and Java now implement half-closed +semantics for DoPut (ARROW-6063). The client can close the writing +end of the stream to signal that it has finished sending the Flight +data, but still receive the batch-specific response and its associated +metadata. + +## C++ notes + +C++ now supports the LargeList, LargeBinary and LargeString datatypes. + +The Status class gains the ability to carry additional subsystem-specific +data with it, under the form of an opaque StatusDetail interface (ARROW-4036). +This allows, for example, to store not only an exception message coming from +Python but the actual Python exception object, such as to raise it again if +the Status is propagated back to Python. It can also enable the consumer +of Status to inspect the subsystem-specific error, such as a finer-grained +Flight error code. + +DataType and Schema equality are significantly faster (ARROW-6038). + +The Column class is completely removed, as it did not have a strong enough +motivation for existing between ChunkedArray and RecordBatch / Table +(ARROW-5893). + +### C++: Parquet + +The 0.15 release includes many improvements to the Apache Parquet C++ internals, +resulting in greatly improved read and write performance. We described the work +and published some benchmarks in a [recent blog post][15]. + +### C++: CSV reader + +The CSV reader is now more flexible in terms of how column names are chosen +(ARROW-6231) and column selection (ARROW-5977). + +### C++: Memory Allocation Layer + +Arrow now has the option to allocate memory using the mimalloc memory +allocator. jemalloc is still preferred for best performance, but mimalloc +is a reasonable alternative to the system allocator on Windows where jemalloc +is not currently supported. + +Also, we now expose explicit global functions to get a MemoryPool for each +of the jemalloc allocator, mimalloc allocator and system allocator (ARROW-6292). + +The vendored jemalloc version is bumped from 4.5.x to 5.2.x (ARROW-6549). +Performance characteristics may differ on memory allocation-heavy workloads, +though we did not notice any significant regression on our suite of +micro-benchmarks (and a multi-threaded benchmark of reading a CSV file +showed a 25% speedup). + +### C++: Filesystem layer + +A FileSystem implementation to access Amazon S3-compatible filesystems is now +available. It depends on the AWS SDK for C++. + +### C++: I/O layer + +Significant improvements were made to the Arrow I/O stack. + +* ARROW-6180: Add RandomAccessFile::GetStream that returns an InputStream over + a fixed subset of the file. +* ARROW-6381: Improve performance of small writes with BufferOutputStream +* ARROW-2490: Streamline concurrency semantics of InputStream implementations, + and add debug checks for race conditions between non-thread-safe InputStream + operations. +* ARROW-6527: Add an OutputStream::Write overload that takes an owned Buffer + rather than a raw memory area. This allows OutputStream implementations + to safely implement delayed writing without having to copy the data. + +### C++: Tensors + +There are three improvements of Tensor and SparseTensor in this release. + +* Add Tensor::Value template function for element access +* Add EqualOptions support in Tensor::Equals function, that allows us to control the way to compare two float tensors +* Add smaller bit-width index supports in SparseTensor + +## C# Notes + +We have fixed some bugs causing incompatibilities between C# and other Arrow +implementations. + +## Java notes + +* Added an initial version of an Avro adapter +* To improve the JDBC adapter performance, refactored consume data logic and + implemented an iterator API to prevent loading all data into one vector +* Implemented subField encoding for complex type, now List and Struct vectors + subField encoding is available +* Implemented visitor API for vector/range/type/approx equals compare +* Performed a lot of optimization and refactoring for DictionaryEncoder, + supporting all data types and avoiding memory copy via hash table and visitor + API +* Introduced [Error Prone][10] into code base to catch more potential errors + earlier +* Fixed the bug where dictionary entries were required in IPC streams even when + empty; readers can now also read interleaved messages + +## Python notes + +The FileSystem API, implemented in C++, is now available in Python (ARROW-5494). + +The API for extension types has been straightened and definition of +custom extension types in Python is now more powerful (ARROW-5610). + +Sparse tensors are now available in Python (ARROW-4453). + +A potential crash when handling Python dates and datetimes was fixed +(ARROW-6597). + +Based on a mailing list discussion, we are looking for help with maintaining +our Python wheels. Community members have found that the wheels take up a great +deal of maintenance time, so if you or your organization depend on `pip install +pyarrow` working, we would appreciate your assistance. + +## Ruby and C GLib notes + +Ruby and C GLib continues to follow the features in the C++ project. +Ruby includes the following backward incompatible changes. + +* Remove Arrow::Struct and use Hash instead. +* Add Arrow::Time for Arrow::Time{32,64}DataType value. +* Arrow::Decimal128Array#get_value returns BigDecimal. + +Ruby improves the performance of Arrow#values. + +## Rust notes + +A number of core Arrow improvements were made to the Rust library. + +* Add explicit SIMD vectorization for the divide kernel +* Add a feature to disable SIMD +* Use "if cfg!" pattern +* Optimizations to BooleanBufferBuilder::append_slice +* Implemented Debug trait for List/Struct/BinaryArray + +Improvements related to Rust Parquet and DataFusion are detailed next. + +### Rust Parquet + +* Implement Arrow record reader +* Add converter that is used to convert record reader's content to arrow primitive array. + +### Rust DataFusion + +* Preview of new query execution engine using an extensible trait-based + physical execution plan that supports parallel execution using threads +* ExecutionContext now has a register_parquet convenience method for + registering Parquet data sources +* Fixed bug in type coercion optimizer rule +* TableProvider.scan() now returns a thread-safe BatchIterator +* Remove use of bare trait objects (switched to using dyn syntax) +* Adds casting from unsigned to signed integer data types + +## R notes + +A major development since the 0.14 release was the arrival of the `arrow` R +package on [CRAN][11]. We wrote about this in August on the [Arrow blog][12]. +In addition to the package availability on CRAN, we also published +[package documentation][13] on the Arrow website. + +The 0.15 R package includes many of the enhancements in the C++ library +release, such as the Parquet performance improvements and the FileSystem API. +In addition, there are a number of upgrades that make it easier to read and +write data, specify types and schema, and interact with Arrow tables and record +batches in R. + +For more on what's in the 0.15 R package, see the [changelog][14]. + +## Community Discussions Ongoing + +There are a number of active discussions ongoing on the developer +d...@arrow.apache.org mailing list. We look forward to hearing from the +community there. + +[1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%200.15.0 +[2]: https://arrow.apache.org/release/0.15.0.html#contributors +[3]: https://arrow.apache.org/release/0.15.0.html +[4]: https://github.com/bkietz +[5]: https://github.com/lidavidm +[6]: https://github.com/mrkn +[7]: https://github.com/nealrichardson +[8]: https://github.com/sbinet +[9]: https://github.com/emkornfield +[10]: https://github.com/google/error-prone +[11]: https://cran.r-project.org/package=arrow +[12]: https://arrow.apache.org/blog/2019/08/08/r-package-on-cran/ +[13]: https://arrow.apache.org/docs/r +[14]: http://arrow.apache.org/docs/r/news/ +[15]: https://arrow.apache.org/blog/2019/09/05/faster-strings-cpp-parquet/ +[16]: https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst +[17]: https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst \ No newline at end of file diff --git a/committers.html b/committers.html index cd32117..cac3c54 100644 --- a/committers.html +++ b/committers.html @@ -250,10 +250,16 @@ description: "List of project-management committee (PMC) members and committers <tr> <td>Praveen Kumar</td> <td>Committer</td> -<td>TBD</td> +<td>praveenbingo</td> <td>Dremio</td> </tr> <tr> +<td>David Li</td> +<td>Committer</td> +<td>lidavidm</td> +<td>Two Sigma</td> +</tr> +<tr> <td>Ben Kietzman</td> <td>Committer</td> <td>bkietz</td>