This is an automated email from the ASF dual-hosted git repository.
kou pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 521d5b7458f GH-35106: [R] Docs for 11.0.0 changelog not updated (#344)
521d5b7458f is described below
commit 521d5b7458f0b21db2be88cfff4a0de7bba3841e
Author: Nic Crane <[email protected]>
AuthorDate: Fri Apr 14 09:54:21 2023 +0100
GH-35106: [R] Docs for 11.0.0 changelog not updated (#344)
The Changelog for the R docs wasn't updated in the last release; I think
this is due to the timing of updates, and things being merged after the
release scripts were ran.
Fix apache/arrow#35106
---
docs/r/news/index.html | 271 +++++++++++++++++++++++++++++++++----------------
1 file changed, 182 insertions(+), 89 deletions(-)
diff --git a/docs/r/news/index.html b/docs/r/news/index.html
index 6a04ef5ddaa..035164761e3 100644
--- a/docs/r/news/index.html
+++ b/docs/r/news/index.html
@@ -25,7 +25,7 @@
<a class="navbar-brand me-2" href="../index.html">Arrow R Package</a>
<span class="version">
- <small class="nav-text text-muted me-auto" data-bs-toggle="tooltip"
data-bs-placement="bottom" title="">11.0.0.2</small>
+ <small class="nav-text text-muted me-auto" data-bs-toggle="tooltip"
data-bs-placement="bottom" title="">11.0.0.3</small>
</span>
@@ -69,7 +69,13 @@
</ul><form class="form-inline my-2 my-lg-0" role="search">
<input type="search" class="form-control me-sm-2" aria-label="Toggle
navigation" name="search-input" data-search-index="../search.json"
id="search-input" placeholder="Search for" autocomplete="off"></form>
- <ul class="navbar-nav"></ul></div>
+ <ul class="navbar-nav"><li class="nav-item">
+ <a class="external-link nav-link" href="https://github.com/apache/arrow/"
aria-label="github">
+ <span class="fab fa fab fa-github fa-lg"></span>
+
+ </a>
+</li>
+ </ul></div>
</div>
@@ -77,16 +83,90 @@
<div class="row">
<main id="main" class="col-md-9"><div class="page-header">
<img src="" class="logo" alt=""><h1>Changelog</h1>
- <small>Source: <a
href="https://github.com/apache/arrow/blob/master/r/NEWS.md"
class="external-link"><code>NEWS.md</code></a></small>
+ <small>Source: <a
href="https://github.com/apache/arrow/blob/main/r/NEWS.md"
class="external-link"><code>NEWS.md</code></a></small>
</div>
- <div class="section level2"><h2 class="pkg-version" data-toc-text="11.0.0"
id="arrow-1100">arrow 11.0.0<a class="anchor" aria-label="anchor"
href="#arrow-1100"></a></h2></div>
+ <div class="section level2">
+<h2 class="pkg-version" data-toc-text="11.0.0.3" id="arrow-11003">arrow
11.0.0.3<a class="anchor" aria-label="anchor" href="#arrow-11003"></a></h2><p
class="text-muted">CRAN release: 2023-03-08</p>
+<div class="section level3">
+<h3 id="minor-improvements-and-fixes-11-0-0-3">Minor improvements and fixes<a
class="anchor" aria-label="anchor"
href="#minor-improvements-and-fixes-11-0-0-3"></a></h3>
+<ul><li>
+<code><a
href="../reference/open_delim_dataset.html">open_csv_dataset()</a></code>
allows a schema to be specified. (<a
href="https://github.com/apache/arrow/issues/34217"
class="external-link">#34217</a>)</li>
+<li>To ensure compatibility with an upcoming dplyr release, we no longer call
<code>dplyr:::check_names()</code> (<a
href="https://github.com/apache/arrow/issues/34369"
class="external-link">#34369</a>)</li>
+</ul></div>
+</div>
+ <div class="section level2">
+<h2 class="pkg-version" data-toc-text="11.0.0.2" id="arrow-11002">arrow
11.0.0.2<a class="anchor" aria-label="anchor" href="#arrow-11002"></a></h2><p
class="text-muted">CRAN release: 2023-02-12</p>
+<div class="section level3">
+<h3 id="breaking-changes-11-0-0-2">Breaking changes<a class="anchor"
aria-label="anchor" href="#breaking-changes-11-0-0-2"></a></h3>
+<ul><li>
+<code><a href="../reference/map_batches.html">map_batches()</a></code> is lazy
by default; it now returns a <code>RecordBatchReader</code> instead of a list
of <code>RecordBatch</code> objects unless <code>lazy = FALSE</code>. (<a
href="https://github.com/apache/arrow/issues/14521"
class="external-link">#14521</a>)</li>
+</ul></div>
+<div class="section level3">
+<h3 id="new-features-11-0-0-2">New features<a class="anchor"
aria-label="anchor" href="#new-features-11-0-0-2"></a></h3>
+<div class="section level4">
+<h4 id="docs-11-0-0-2">Docs<a class="anchor" aria-label="anchor"
href="#docs-11-0-0-2"></a></h4>
+<ul><li>A substantial reorganisation, rewrite of and addition to, many of the
vignettes and README. (<a href="https://github.com/djnavarro"
class="external-link">@djnavarro</a>, <a
href="https://github.com/apache/arrow/issues/14514"
class="external-link">#14514</a>)</li>
+</ul></div>
+<div class="section level4">
+<h4 id="readingwriting-data-11-0-0-2">Reading/writing data<a class="anchor"
aria-label="anchor" href="#readingwriting-data-11-0-0-2"></a></h4>
+<ul><li>New functions <code><a
href="../reference/open_delim_dataset.html">open_csv_dataset()</a></code>,
<code><a
href="../reference/open_delim_dataset.html">open_tsv_dataset()</a></code>, and
<code><a
href="../reference/open_delim_dataset.html">open_delim_dataset()</a></code> all
wrap <code><a href="../reference/open_dataset.html">open_dataset()</a></code>-
they don’t provide new functionality, but allow for readr-style options to be
supplied, making it simpler to switch between indivi [...]
+<li>User-defined null values can be set when writing CSVs both as datasets and
as individual files. (<a href="https://github.com/wjones127"
class="external-link">@wjones127</a>, <a
href="https://github.com/apache/arrow/issues/14679"
class="external-link">#14679</a>)</li>
+<li>The new <code>col_names</code> parameter allows specification of column
names when opening a CSV dataset. (<a href="https://github.com/wjones127"
class="external-link">@wjones127</a>, <a
href="https://github.com/apache/arrow/issues/14705"
class="external-link">#14705</a>)</li>
+<li>The <code>parse_options</code>, <code>read_options</code>, and
<code>convert_options</code> parameters for reading individual files
(<code>read_*_arrow()</code> functions) and datasets (<code><a
href="../reference/open_dataset.html">open_dataset()</a></code> and the new
<code>open_*_dataset()</code> functions) can be passed in as lists. (<a
href="https://github.com/apache/arrow/issues/15270"
class="external-link">#15270</a>)</li>
+<li>File paths containing accents can be read by <code><a
href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code>. (<a
href="https://github.com/apache/arrow/issues/14930"
class="external-link">#14930</a>)</li>
+</ul></div>
+<div class="section level4">
+<h4 id="dplyr-compatibility-11-0-0-2">dplyr compatibility<a class="anchor"
aria-label="anchor" href="#dplyr-compatibility-11-0-0-2"></a></h4>
+<ul><li>New dplyr (1.1.0) function <code>join_by()</code> has been implemented
for dplyr joins on Arrow objects (equality conditions only). (<a
href="https://github.com/apache/arrow/issues/33664"
class="external-link">#33664</a>)</li>
+<li>Output is accurate when multiple <code><a
href="https://dplyr.tidyverse.org/reference/group_by.html"
class="external-link">dplyr::group_by()</a></code>/<code><a
href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">dplyr::summarise()</a></code> calls are used. (<a
href="https://github.com/apache/arrow/issues/14905"
class="external-link">#14905</a>)</li>
+<li>
+<code><a href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">dplyr::summarize()</a></code> works with division when
divisor is a variable. (<a href="https://github.com/apache/arrow/issues/14933"
class="external-link">#14933</a>)</li>
+<li>
+<code><a href="https://dplyr.tidyverse.org/reference/mutate-joins.html"
class="external-link">dplyr::right_join()</a></code> correctly coalesces keys.
(<a href="https://github.com/apache/arrow/issues/15077"
class="external-link">#15077</a>)</li>
+<li>Multiple changes to ensure compatibility with dplyr 1.1.0. (<a
href="https://github.com/lionel-" class="external-link">@lionel-</a>, <a
href="https://github.com/apache/arrow/issues/14948"
class="external-link">#14948</a>)</li>
+</ul></div>
+<div class="section level4">
+<h4 id="function-bindings-11-0-0-2">Function bindings<a class="anchor"
aria-label="anchor" href="#function-bindings-11-0-0-2"></a></h4>
+<ul><li>The following functions can be used in queries on Arrow objects:
+<ul><li>
+<code><a href="https://lubridate.tidyverse.org/reference/with_tz.html"
class="external-link">lubridate::with_tz()</a></code> and <code><a
href="https://lubridate.tidyverse.org/reference/force_tz.html"
class="external-link">lubridate::force_tz()</a></code> (<a
href="https://github.com/eitsupi" class="external-link">@eitsupi</a>, <a
href="https://github.com/apache/arrow/issues/14093"
class="external-link">#14093</a>)</li>
+<li>
+<code><a href="https://stringr.tidyverse.org/reference/str_remove.html"
class="external-link">stringr::str_remove()</a></code> and <code><a
href="https://stringr.tidyverse.org/reference/str_remove.html"
class="external-link">stringr::str_remove_all()</a></code> (<a
href="https://github.com/apache/arrow/issues/14644"
class="external-link">#14644</a>)</li>
+</ul></li>
+</ul></div>
+<div class="section level4">
+<h4 id="arrow-object-creation-11-0-0-2">Arrow object creation<a class="anchor"
aria-label="anchor" href="#arrow-object-creation-11-0-0-2"></a></h4>
+<ul><li>Arrow Scalars can be created from <code>POSIXlt</code> objects. (<a
href="https://github.com/apache/arrow/issues/15277"
class="external-link">#15277</a>)</li>
+<li>
+<code>Array$create()</code> can create Decimal arrays. (<a
href="https://github.com/apache/arrow/issues/15211"
class="external-link">#15211</a>)</li>
+<li>
+<code>StructArray$create()</code> can be used to create StructArray objects.
(<a href="https://github.com/apache/arrow/issues/14922"
class="external-link">#14922</a>)</li>
+<li>Creating an Array from an object bigger than 2^31 has correct length (<a
href="https://github.com/apache/arrow/issues/14929"
class="external-link">#14929</a>)</li>
+</ul></div>
+<div class="section level4">
+<h4 id="installation-11-0-0-2">Installation<a class="anchor"
aria-label="anchor" href="#installation-11-0-0-2"></a></h4>
+<ul><li>Improved offline installation using pre-downloaded binaries. (<a
href="https://github.com/pgramme" class="external-link">@pgramme</a>, <a
href="https://github.com/apache/arrow/issues/14086"
class="external-link">#14086</a>)</li>
+<li>The package can automatically link to system installations of the AWS SDK
for C++. (<a href="https://github.com/kou" class="external-link">@kou</a>, <a
href="https://github.com/apache/arrow/issues/14235"
class="external-link">#14235</a>)</li>
+</ul></div>
+</div>
+<div class="section level3">
+<h3 id="minor-improvements-and-fixes-11-0-0-2">Minor improvements and fixes<a
class="anchor" aria-label="anchor"
href="#minor-improvements-and-fixes-11-0-0-2"></a></h3>
+<ul><li>Calling <code><a
href="https://lubridate.tidyverse.org/reference/as_date.html"
class="external-link">lubridate::as_datetime()</a></code> on Arrow objects can
handle time in sub-seconds. (<a href="https://github.com/eitsupi"
class="external-link">@eitsupi</a>, <a
href="https://github.com/apache/arrow/issues/13890"
class="external-link">#13890</a>)</li>
+<li>
+<code><a href="https://rdrr.io/r/utils/head.html"
class="external-link">head()</a></code> can be called after <code><a
href="../reference/as_record_batch_reader.html">as_record_batch_reader()</a></code>.
(<a href="https://github.com/apache/arrow/issues/14518"
class="external-link">#14518</a>)</li>
+<li>
+<code><a href="https://rdrr.io/r/base/as.Date.html"
class="external-link">as.Date()</a></code> can go from
<code>timestamp[us]</code> to <code>timestamp[s]</code>. (<a
href="https://github.com/apache/arrow/issues/14935"
class="external-link">#14935</a>)</li>
+<li>curl timeout policy can be configured for S3. (<a
href="https://github.com/apache/arrow/issues/15166"
class="external-link">#15166</a>)</li>
+<li>rlang dependency must be at least version 1.0.0 because of
<code>check_dots_empty()</code>. (<a href="https://github.com/daattali"
class="external-link">@daattali</a>, <a
href="https://github.com/apache/arrow/issues/14744"
class="external-link">#14744</a>)</li>
+</ul></div>
+</div>
<div class="section level2">
<h2 class="pkg-version" data-toc-text="10.0.1" id="arrow-1001">arrow 10.0.1<a
class="anchor" aria-label="anchor" href="#arrow-1001"></a></h2><p
class="text-muted">CRAN release: 2022-12-06</p>
<p>Minor improvements and fixes:</p>
-<ul><li>Fixes for failing test after lubridate 1.9 release (<a
href="https://issues.apache.org/jira/browse/<a%20href='https://issues.apache.org/jira/browse/ARROW-18285'>ARROW-18285</a>"
class="external-link"></a><a
href="https://issues.apache.org/jira/browse/ARROW-18285"
class="external-link">ARROW-18285</a>)</li>
-<li>Update to ensure compatibility with changes in dev purrr (<a
href="https://issues.apache.org/jira/browse/<a%20href='https://issues.apache.org/jira/browse/ARROW-18305'>ARROW-18305</a>"
class="external-link"></a><a
href="https://issues.apache.org/jira/browse/ARROW-18305"
class="external-link">ARROW-18305</a>)</li>
-<li>Fix to correctly handle <code>.data</code> pronoun in <code><a
href="https://dplyr.tidyverse.org/reference/group_by.html"
class="external-link">dplyr::group_by()</a></code> (<a
href="https://issues.apache.org/jira/browse/<a%20href='https://issues.apache.org/jira/browse/ARROW-18131'>ARROW-18131</a>"
class="external-link"></a><a
href="https://issues.apache.org/jira/browse/ARROW-18131"
class="external-link">ARROW-18131</a>)</li>
+<ul><li>Fixes for failing test after lubridate 1.9 release (<a
href="https://github.com/apache/arrow/issues/14615"
class="external-link">#14615</a>)</li>
+<li>Update to ensure compatibility with changes in dev purrr (<a
href="https://github.com/apache/arrow/issues/14581"
class="external-link">#14581</a>)</li>
+<li>Fix to correctly handle <code>.data</code> pronoun in <code><a
href="https://dplyr.tidyverse.org/reference/group_by.html"
class="external-link">dplyr::group_by()</a></code> (<a
href="https://github.com/apache/arrow/issues/14484"
class="external-link">#14484</a>)</li>
</ul></div>
<div class="section level2">
<h2 class="pkg-version" data-toc-text="10.0.0" id="arrow-1000">arrow 10.0.0<a
class="anchor" aria-label="anchor" href="#arrow-1000"></a></h2><p
class="text-muted">CRAN release: 2022-10-26</p>
@@ -94,7 +174,7 @@
<h3 id="arrow-dplyr-queries-10-0-0">Arrow dplyr queries<a class="anchor"
aria-label="anchor" href="#arrow-dplyr-queries-10-0-0"></a></h3>
<p>Several new functions can be used in queries:</p>
<ul><li>
-<code><a href="https://dplyr.tidyverse.org/reference/across.html"
class="external-link">dplyr::across()</a></code> can be used to apply the same
computation across multiple columns, and the <code>where()</code> selection
helper is supported in <code><a
href="https://dplyr.tidyverse.org/reference/across.html"
class="external-link">across()</a></code>;</li>
+<code><a href="https://dplyr.tidyverse.org/reference/across.html"
class="external-link">dplyr::across()</a></code> can be used to apply the same
computation across multiple columns, and the <code>where()</code> selection
helper is supported in <code>across()</code>;</li>
<li>
<code><a href="../reference/add_filename.html">add_filename()</a></code> can
be used to get the filename a row came from (only available when querying
<code><a href="../reference/Dataset.html">?Dataset</a></code>);</li>
<li>Added five functions in the <code>slice_*</code> family: <code><a
href="https://dplyr.tidyverse.org/reference/slice.html"
class="external-link">dplyr::slice_min()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/slice.html"
class="external-link">dplyr::slice_max()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/slice.html"
class="external-link">dplyr::slice_head()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/slice.html" class="exte [...]
@@ -128,63 +208,68 @@
<h2 class="pkg-version" data-toc-text="9.0.0" id="arrow-900">arrow 9.0.0<a
class="anchor" aria-label="anchor" href="#arrow-900"></a></h2><p
class="text-muted">CRAN release: 2022-08-10</p>
<div class="section level3">
<h3 id="arrow-dplyr-queries-9-0-0">Arrow dplyr queries<a class="anchor"
aria-label="anchor" href="#arrow-dplyr-queries-9-0-0"></a></h3>
-<ul><li>New dplyr verbs:<ul><li>
-<code><a href="https://generics.r-lib.org/reference/setops.html"
class="external-link">dplyr::union</a></code> and <code><a
href="https://dplyr.tidyverse.org/reference/setops.html"
class="external-link">dplyr::union_all</a></code> (<a
href="https://issues.apache.org/jira/browse/ARROW-15622"
class="external-link">ARROW-15622</a>)</li>
+<ul><li>New dplyr verbs:
+<ul><li>
+<code><a href="https://generics.r-lib.org/reference/setops.html"
class="external-link">dplyr::union</a></code> and <code><a
href="https://dplyr.tidyverse.org/reference/setops.html"
class="external-link">dplyr::union_all</a></code> (<a
href="https://github.com/apache/arrow/issues/13090"
class="external-link">#13090</a>)</li>
<li>
-<code><a href="https://pillar.r-lib.org/reference/glimpse.html"
class="external-link">dplyr::glimpse</a></code> (<a
href="https://issues.apache.org/jira/browse/ARROW-16776"
class="external-link">ARROW-16776</a>)</li>
+<code><a href="https://pillar.r-lib.org/reference/glimpse.html"
class="external-link">dplyr::glimpse</a></code> (<a
href="https://github.com/apache/arrow/issues/13563"
class="external-link">#13563</a>)</li>
<li>
-<code><a href="../reference/show_exec_plan.html">show_exec_plan()</a></code>
can be added to the end of a dplyr pipeline to show the underlying plan,
similar to <code><a href="https://dplyr.tidyverse.org/reference/explain.html"
class="external-link">dplyr::show_query()</a></code>. <code><a
href="https://dplyr.tidyverse.org/reference/explain.html"
class="external-link">dplyr::show_query()</a></code> and <code><a
href="https://dplyr.tidyverse.org/reference/explain.html" class="external-lin
[...]
+<code><a href="../reference/show_exec_plan.html">show_exec_plan()</a></code>
can be added to the end of a dplyr pipeline to show the underlying plan,
similar to <code><a href="https://dplyr.tidyverse.org/reference/explain.html"
class="external-link">dplyr::show_query()</a></code>. <code><a
href="https://dplyr.tidyverse.org/reference/explain.html"
class="external-link">dplyr::show_query()</a></code> and <code><a
href="https://dplyr.tidyverse.org/reference/explain.html" class="external-lin
[...]
</ul></li>
-<li>User-defined functions are supported in queries. Use <code><a
href="../reference/register_scalar_function.html">register_scalar_function()</a></code>
to create them. (<a href="https://issues.apache.org/jira/browse/ARROW-16444"
class="external-link">ARROW-16444</a>)</li>
+<li>User-defined functions are supported in queries. Use <code><a
href="../reference/register_scalar_function.html">register_scalar_function()</a></code>
to create them. (<a href="https://github.com/apache/arrow/issues/13397"
class="external-link">#13397</a>)</li>
<li>
-<code><a href="../reference/map_batches.html">map_batches()</a></code> returns
a <code>RecordBatchReader</code> and requires that the function it maps returns
something coercible to a <code>RecordBatch</code> through the <code><a
href="../reference/as_record_batch.html">as_record_batch()</a></code> S3
function. It can also run in streaming fashion if passed <code>.lazy =
TRUE</code>. (<a href="https://issues.apache.org/jira/browse/ARROW-15271"
class="external-link">ARROW-15271</a>, <a hr [...]
-<li>Functions can be called with package namespace prefixes (e.g.
<code>stringr::</code>, <code>lubridate::</code>) within queries. For example,
<code><a href="https://stringr.tidyverse.org/reference/str_length.html"
class="external-link">stringr::str_length</a></code> will now dispatch to the
same kernel as <code>str_length</code>. (<a
href="https://issues.apache.org/jira/browse/ARROW-14575"
class="external-link">ARROW-14575</a>)</li>
-<li>Support for new functions:<ul><li>
-<code><a href="https://lubridate.tidyverse.org/reference/parse_date_time.html"
class="external-link">lubridate::parse_date_time()</a></code> datetime parser:
(<a href="https://issues.apache.org/jira/browse/ARROW-14848"
class="external-link">ARROW-14848</a>, <a
href="https://issues.apache.org/jira/browse/ARROW-16407"
class="external-link">ARROW-16407</a>)<ul><li>
+<code><a href="../reference/map_batches.html">map_batches()</a></code> returns
a <code>RecordBatchReader</code> and requires that the function it maps returns
something coercible to a <code>RecordBatch</code> through the <code><a
href="../reference/as_record_batch.html">as_record_batch()</a></code> S3
function. It can also run in streaming fashion if passed <code>.lazy =
TRUE</code>. (<a href="https://github.com/apache/arrow/issues/13170"
class="external-link">#13170</a>, <a href="https: [...]
+<li>Functions can be called with package namespace prefixes (e.g.
<code>stringr::</code>, <code>lubridate::</code>) within queries. For example,
<code><a href="https://stringr.tidyverse.org/reference/str_length.html"
class="external-link">stringr::str_length</a></code> will now dispatch to the
same kernel as <code>str_length</code>. (<a
href="https://github.com/apache/arrow/issues/13160"
class="external-link">#13160</a>)</li>
+<li>Support for new functions:
+<ul><li>
+<code><a href="https://lubridate.tidyverse.org/reference/parse_date_time.html"
class="external-link">lubridate::parse_date_time()</a></code> datetime parser:
(<a href="https://github.com/apache/arrow/issues/12589"
class="external-link">#12589</a>, <a
href="https://github.com/apache/arrow/issues/13196"
class="external-link">#13196</a>, <a
href="https://github.com/apache/arrow/issues/13506"
class="external-link">#13506</a>)
+<ul><li>
<code>orders</code> with year, month, day, hours, minutes, and seconds
components are supported.</li>
<li>the <code>orders</code> argument in the Arrow binding works as follows:
<code>orders</code> are transformed into <code>formats</code> which
subsequently get applied in turn. There is no <code>select_formats</code>
parameter and no inference takes place (like is the case in <code><a
href="https://lubridate.tidyverse.org/reference/parse_date_time.html"
class="external-link">lubridate::parse_date_time()</a></code>).</li>
</ul></li>
<li>
-<code>lubridate</code> date and datetime parsers such as <code><a
href="https://lubridate.tidyverse.org/reference/ymd.html"
class="external-link">lubridate::ymd()</a></code>, <code><a
href="https://lubridate.tidyverse.org/reference/ymd.html"
class="external-link">lubridate::yq()</a></code>, and <code><a
href="https://lubridate.tidyverse.org/reference/ymd_hms.html"
class="external-link">lubridate::ymd_hms()</a></code> (<a
href="https://issues.apache.org/jira/browse/ARROW-16394" class="ext [...]
+<code>lubridate</code> date and datetime parsers such as <code><a
href="https://lubridate.tidyverse.org/reference/ymd.html"
class="external-link">lubridate::ymd()</a></code>, <code><a
href="https://lubridate.tidyverse.org/reference/ymd.html"
class="external-link">lubridate::yq()</a></code>, and <code><a
href="https://lubridate.tidyverse.org/reference/ymd_hms.html"
class="external-link">lubridate::ymd_hms()</a></code> (<a
href="https://github.com/apache/arrow/issues/13118" class="external [...]
<li>
-<code><a href="https://lubridate.tidyverse.org/reference/parse_date_time.html"
class="external-link">lubridate::fast_strptime()</a></code> (<a
href="https://issues.apache.org/jira/browse/ARROW-16439"
class="external-link">ARROW-16439</a>)</li>
+<code><a href="https://lubridate.tidyverse.org/reference/parse_date_time.html"
class="external-link">lubridate::fast_strptime()</a></code> (<a
href="https://github.com/apache/arrow/issues/13174"
class="external-link">#13174</a>)</li>
<li>
-<code><a href="https://lubridate.tidyverse.org/reference/round_date.html"
class="external-link">lubridate::floor_date()</a></code>, <code><a
href="https://lubridate.tidyverse.org/reference/round_date.html"
class="external-link">lubridate::ceiling_date()</a></code>, and <code><a
href="https://lubridate.tidyverse.org/reference/round_date.html"
class="external-link">lubridate::round_date()</a></code> (<a
href="https://issues.apache.org/jira/browse/ARROW-14821"
class="external-link">ARROW-14 [...]
+<code><a href="https://lubridate.tidyverse.org/reference/round_date.html"
class="external-link">lubridate::floor_date()</a></code>, <code><a
href="https://lubridate.tidyverse.org/reference/round_date.html"
class="external-link">lubridate::ceiling_date()</a></code>, and <code><a
href="https://lubridate.tidyverse.org/reference/round_date.html"
class="external-link">lubridate::round_date()</a></code> (<a
href="https://github.com/apache/arrow/issues/12154"
class="external-link">#12154</a>)</li>
<li>
-<code><a href="https://rdrr.io/r/base/strptime.html"
class="external-link">strptime()</a></code> supports the <code>tz</code>
argument to pass timezones. (<a
href="https://issues.apache.org/jira/browse/ARROW-16415"
class="external-link">ARROW-16415</a>)</li>
+<code><a href="https://rdrr.io/r/base/strptime.html"
class="external-link">strptime()</a></code> supports the <code>tz</code>
argument to pass timezones. (<a
href="https://github.com/apache/arrow/issues/13190"
class="external-link">#13190</a>)</li>
<li>
<code><a href="https://lubridate.tidyverse.org/reference/day.html"
class="external-link">lubridate::qday()</a></code> (day of quarter)</li>
<li>
-<code><a href="https://rdrr.io/r/base/Log.html"
class="external-link">exp()</a></code> and <code><a
href="https://rdrr.io/r/base/MathFun.html"
class="external-link">sqrt()</a></code>. (<a
href="https://issues.apache.org/jira/browse/ARROW-16871"
class="external-link">ARROW-16871</a>)</li>
+<code><a href="https://rdrr.io/r/base/Log.html"
class="external-link">exp()</a></code> and <code><a
href="https://rdrr.io/r/base/MathFun.html"
class="external-link">sqrt()</a></code>. (<a
href="https://github.com/apache/arrow/issues/13517"
class="external-link">#13517</a>)</li>
</ul></li>
-<li>Bugfixes:<ul><li>Count distinct now gives correct result across multiple
row groups. (<a href="https://issues.apache.org/jira/browse/ARROW-16807"
class="external-link">ARROW-16807</a>)</li>
-<li>Aggregations over partition columns return correct results. (<a
href="https://issues.apache.org/jira/browse/ARROW-16700"
class="external-link">ARROW-16700</a>)</li>
+<li>Bugfixes:
+<ul><li>Count distinct now gives correct result across multiple row groups.
(<a href="https://github.com/apache/arrow/issues/13583"
class="external-link">#13583</a>)</li>
+<li>Aggregations over partition columns return correct results. (<a
href="https://github.com/apache/arrow/issues/13518"
class="external-link">#13518</a>)</li>
</ul></li>
</ul></div>
<div class="section level3">
<h3 id="reading-and-writing-9-0-0">Reading and writing<a class="anchor"
aria-label="anchor" href="#reading-and-writing-9-0-0"></a></h3>
<ul><li>New functions <code><a
href="../reference/read_feather.html">read_ipc_file()</a></code> and <code><a
href="../reference/write_feather.html">write_ipc_file()</a></code> are added.
These functions are almost the same as <code><a
href="../reference/read_feather.html">read_feather()</a></code> and <code><a
href="../reference/write_feather.html">write_feather()</a></code>, but differ
in that they only target IPC files (Feather V2 files), not Feather V1
files.</li>
<li>
-<code>read_arrow()</code> and <code>write_arrow()</code>, deprecated since
1.0.0 (July 2020), have been removed. Instead of these, use the <code><a
href="../reference/read_feather.html">read_ipc_file()</a></code> and <code><a
href="../reference/write_feather.html">write_ipc_file()</a></code> for IPC
files, or, <code><a
href="../reference/read_ipc_stream.html">read_ipc_stream()</a></code> and
<code><a
href="../reference/write_ipc_stream.html">write_ipc_stream()</a></code> for IPC
streams. [...]
+<code>read_arrow()</code> and <code>write_arrow()</code>, deprecated since
1.0.0 (July 2020), have been removed. Instead of these, use the <code><a
href="../reference/read_feather.html">read_ipc_file()</a></code> and <code><a
href="../reference/write_feather.html">write_ipc_file()</a></code> for IPC
files, or, <code><a
href="../reference/read_ipc_stream.html">read_ipc_stream()</a></code> and
<code><a
href="../reference/write_ipc_stream.html">write_ipc_stream()</a></code> for IPC
streams. [...]
<li>
-<code><a href="../reference/write_parquet.html">write_parquet()</a></code> now
defaults to writing Parquet format version 2.4 (was 1.0). Previously deprecated
arguments <code>properties</code> and <code>arrow_properties</code> have been
removed; if you need to deal with these lower-level properties objects
directly, use <code>ParquetFileWriter</code>, which <code><a
href="../reference/write_parquet.html">write_parquet()</a></code> wraps. (<a
href="https://issues.apache.org/jira/browse/AR [...]
-<li>UnionDatasets can unify schemas of multiple InMemoryDatasets with varying
schemas. (<a href="https://issues.apache.org/jira/browse/ARROW-16085"
class="external-link">ARROW-16085</a>)</li>
+<code><a href="../reference/write_parquet.html">write_parquet()</a></code> now
defaults to writing Parquet format version 2.4 (was 1.0). Previously deprecated
arguments <code>properties</code> and <code>arrow_properties</code> have been
removed; if you need to deal with these lower-level properties objects
directly, use <code>ParquetFileWriter</code>, which <code><a
href="../reference/write_parquet.html">write_parquet()</a></code> wraps. (<a
href="https://github.com/apache/arrow/issues/1 [...]
+<li>UnionDatasets can unify schemas of multiple InMemoryDatasets with varying
schemas. (<a href="https://github.com/apache/arrow/issues/13088"
class="external-link">#13088</a>)</li>
<li>
-<code><a href="../reference/write_dataset.html">write_dataset()</a></code>
preserves all schema metadata again. In 8.0.0, it would drop most metadata,
breaking packages such as sfarrow. (<a
href="https://issues.apache.org/jira/browse/ARROW-16511"
class="external-link">ARROW-16511</a>)</li>
-<li>Reading and writing functions (such as <code><a
href="../reference/write_csv_arrow.html">write_csv_arrow()</a></code>) will
automatically (de-)compress data if the file path contains a compression
extension (e.g. <code>"data.csv.gz"</code>). This works locally as well as on
remote filesystems like S3 and GCS. (<a
href="https://issues.apache.org/jira/browse/ARROW-16144"
class="external-link">ARROW-16144</a>)</li>
+<code><a href="../reference/write_dataset.html">write_dataset()</a></code>
preserves all schema metadata again. In 8.0.0, it would drop most metadata,
breaking packages such as sfarrow. (<a
href="https://github.com/apache/arrow/issues/13105"
class="external-link">#13105</a>)</li>
+<li>Reading and writing functions (such as <code><a
href="../reference/write_csv_arrow.html">write_csv_arrow()</a></code>) will
automatically (de-)compress data if the file path contains a compression
extension (e.g. <code>"data.csv.gz"</code>). This works locally as well as on
remote filesystems like S3 and GCS. (<a
href="https://github.com/apache/arrow/issues/13183"
class="external-link">#13183</a>)</li>
<li>
-<code>FileSystemFactoryOptions</code> can be provided to <code><a
href="../reference/open_dataset.html">open_dataset()</a></code>, allowing you
to pass options such as which file prefixes to ignore. (<a
href="https://issues.apache.org/jira/browse/ARROW-15280"
class="external-link">ARROW-15280</a>)</li>
-<li>By default, <code>S3FileSystem</code> will not create or delete buckets.
To enable that, pass the configuration option
<code>allow_bucket_creation</code> or <code>allow_bucket_deletion</code>. (<a
href="https://issues.apache.org/jira/browse/ARROW-15906"
class="external-link">ARROW-15906</a>)</li>
+<code>FileSystemFactoryOptions</code> can be provided to <code><a
href="../reference/open_dataset.html">open_dataset()</a></code>, allowing you
to pass options such as which file prefixes to ignore. (<a
href="https://github.com/apache/arrow/issues/13171"
class="external-link">#13171</a>)</li>
+<li>By default, <code>S3FileSystem</code> will not create or delete buckets.
To enable that, pass the configuration option
<code>allow_bucket_creation</code> or <code>allow_bucket_deletion</code>. (<a
href="https://github.com/apache/arrow/issues/13206"
class="external-link">#13206</a>)</li>
<li>
-<code>GcsFileSystem</code> and <code><a
href="../reference/gs_bucket.html">gs_bucket()</a></code> allow connecting to
Google Cloud Storage. (<a
href="https://issues.apache.org/jira/browse/ARROW-13404"
class="external-link">ARROW-13404</a>, <a
href="https://issues.apache.org/jira/browse/ARROW-16887"
class="external-link">ARROW-16887</a>)</li>
+<code>GcsFileSystem</code> and <code><a
href="../reference/gs_bucket.html">gs_bucket()</a></code> allow connecting to
Google Cloud Storage. (<a href="https://github.com/apache/arrow/issues/10999"
class="external-link">#10999</a>, <a
href="https://github.com/apache/arrow/issues/13601"
class="external-link">#13601</a>)</li>
</ul></div>
<div class="section level3">
<h3 id="arrays-and-tables-9-0-0">Arrays and tables<a class="anchor"
aria-label="anchor" href="#arrays-and-tables-9-0-0"></a></h3>
-<ul><li>Table and RecordBatch <code>$num_rows()</code> method returns a double
(previously integer), avoiding integer overflow on larger tables. (<a
href="https://issues.apache.org/jira/browse/ARROW-14989"
class="external-link">ARROW-14989</a>, <a
href="https://issues.apache.org/jira/browse/ARROW-16977"
class="external-link">ARROW-16977</a>)</li></ul></div>
+<ul><li>Table and RecordBatch <code>$num_rows()</code> method returns a double
(previously integer), avoiding integer overflow on larger tables. (<a
href="https://github.com/apache/arrow/issues/13482"
class="external-link">#13482</a>, <a
href="https://github.com/apache/arrow/issues/13514"
class="external-link">#13514</a>)</li>
+</ul></div>
<div class="section level3">
<h3 id="packaging-9-0-0">Packaging<a class="anchor" aria-label="anchor"
href="#packaging-9-0-0"></a></h3>
<ul><li>The <code>arrow.dev_repo</code> for nightly builds of the R package
and prebuilt libarrow binaries is now <a
href="https://nightlies.apache.org/arrow/r/" class="external-link
uri">https://nightlies.apache.org/arrow/r/</a>.</li>
-<li>Brotli and BZ2 are shipped with MacOS binaries. BZ2 is shipped with
Windows binaries. (<a href="https://issues.apache.org/jira/browse/ARROW-16828"
class="external-link">ARROW-16828</a>)</li>
+<li>Brotli and BZ2 are shipped with MacOS binaries. BZ2 is shipped with
Windows binaries. (<a href="https://github.com/apache/arrow/issues/13484"
class="external-link">#13484</a>)</li>
</ul></div>
</div>
<div class="section level2">
@@ -192,10 +277,12 @@
<div class="section level3">
<h3 id="enhancements-to-dplyr-and-datasets-8-0-0">Enhancements to dplyr and
datasets<a class="anchor" aria-label="anchor"
href="#enhancements-to-dplyr-and-datasets-8-0-0"></a></h3>
<ul><li>
-<code><a
href="../reference/open_dataset.html">open_dataset()</a></code>:<ul><li>correctly
supports the <code>skip</code> argument for skipping header rows in CSV
datasets.</li>
+<code><a href="../reference/open_dataset.html">open_dataset()</a></code>:
+<ul><li>correctly supports the <code>skip</code> argument for skipping header
rows in CSV datasets.</li>
<li>can take a list of datasets with differing schemas and attempt to unify
the schemas to produce a <code>UnionDataset</code>.</li>
</ul></li>
-<li>Arrow <a href="https://dplyr.tidyverse.org"
class="external-link">dplyr</a> queries:<ul><li>are supported on
<code>RecordBatchReader</code>. This allows, for example, results from DuckDB
to be streamed back into Arrow rather than materialized before continuing the
pipeline.</li>
+<li>Arrow <a href="https://dplyr.tidyverse.org"
class="external-link">dplyr</a> queries:
+<ul><li>are supported on <code>RecordBatchReader</code>. This allows, for
example, results from DuckDB to be streamed back into Arrow rather than
materialized before continuing the pipeline.</li>
<li>no longer need to materialize the entire result table before writing to a
dataset if the query contains aggregations or joins.</li>
<li>supports <code><a href="https://dplyr.tidyverse.org/reference/rename.html"
class="external-link">dplyr::rename_with()</a></code>.</li>
<li>
@@ -216,7 +303,9 @@
<h3 id="enhancements-to-date-and-time-support-8-0-0">Enhancements to date and
time support<a class="anchor" aria-label="anchor"
href="#enhancements-to-date-and-time-support-8-0-0"></a></h3>
<ul><li>
<code><a
href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code>’s
readr-style type <code>T</code> is mapped to <code>timestamp(unit =
"ns")</code> instead of <code>timestamp(unit = "s")</code>.</li>
-<li>For Arrow dplyr queries, added additional <a
href="https://lubridate.tidyverse.org" class="external-link">lubridate</a>
features and fixes:<ul><li>New component extraction functions:<ul><li>
+<li>For Arrow dplyr queries, added additional <a
href="https://lubridate.tidyverse.org" class="external-link">lubridate</a>
features and fixes:
+<ul><li>New component extraction functions:
+<ul><li>
<code><a href="https://lubridate.tidyverse.org/reference/tz.html"
class="external-link">lubridate::tz()</a></code> (timezone),</li>
<li>
<code><a href="https://lubridate.tidyverse.org/reference/quarter.html"
class="external-link">lubridate::semester()</a></code>,</li>
@@ -243,7 +332,8 @@
<code><a href="https://lubridate.tidyverse.org/reference/as_date.html"
class="external-link">lubridate::as_date()</a></code> and <code><a
href="https://lubridate.tidyverse.org/reference/as_date.html"
class="external-link">lubridate::as_datetime()</a></code>
</li>
</ul></li>
-<li>Also for Arrow dplyr queries, added support and fixes for base date and
time functions:<ul><li>
+<li>Also for Arrow dplyr queries, added support and fixes for base date and
time functions:
+<ul><li>
<code><a href="https://rdrr.io/r/base/difftime.html"
class="external-link">base::difftime</a></code> and <code><a
href="https://rdrr.io/r/base/difftime.html"
class="external-link">base::as.difftime()</a></code>
</li>
<li>
@@ -275,7 +365,8 @@
<li>Math group generics are implemented for ArrowDatum. This means you can use
base functions like <code><a href="https://rdrr.io/r/base/MathFun.html"
class="external-link">sqrt()</a></code>, <code><a
href="https://rdrr.io/r/base/Log.html" class="external-link">log()</a></code>,
and <code><a href="https://rdrr.io/r/base/Log.html"
class="external-link">exp()</a></code> with Arrow arrays and scalars.</li>
<li>
<code>read_*</code> and <code>write_*</code> functions support R Connection
objects for reading and writing files.</li>
-<li>Parquet improvements:<ul><li>Parquet writer supports Duration type
columns.</li>
+<li>Parquet improvements:
+<ul><li>Parquet writer supports Duration type columns.</li>
<li>The dataset Parquet reader consumes less memory.</li>
</ul></li>
<li>
@@ -298,9 +389,9 @@
<div class="section level3">
<h3 id="enhancements-to-dplyr-and-datasets-7-0-0">Enhancements to dplyr and
datasets<a class="anchor" aria-label="anchor"
href="#enhancements-to-dplyr-and-datasets-7-0-0"></a></h3>
<ul><li>Additional <a href="https://lubridate.tidyverse.org"
class="external-link">lubridate</a> features: <code>week()</code>, more of the
<code>is.*()</code> functions, and the label argument to <code>month()</code>
have been implemented.</li>
-<li>More complex expressions inside <code><a
href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">summarize()</a></code>, such as <code>ifelse(n() > 1,
mean(y), mean(z))</code>, are supported.</li>
+<li>More complex expressions inside <code>summarize()</code>, such as
<code>ifelse(n() > 1, mean(y), mean(z))</code>, are supported.</li>
<li>When adding columns in a dplyr pipeline, one can now use
<code>tibble</code> and <code>data.frame</code> to create columns of tibbles or
data.frames respectively (e.g. <code>... %>% mutate(df_col = tibble(a, b))
%>% ...</code>).</li>
-<li>Dictionary columns (R <code>factor</code> type) are supported inside of
<code><a href="https://dplyr.tidyverse.org/reference/coalesce.html"
class="external-link">coalesce()</a></code>.</li>
+<li>Dictionary columns (R <code>factor</code> type) are supported inside of
<code>coalesce()</code>.</li>
<li>
<code><a href="../reference/open_dataset.html">open_dataset()</a></code>
accepts the <code>partitioning</code> argument when reading Hive-style
partitioned files, even though it is not required.</li>
<li>The experimental <code><a
href="../reference/map_batches.html">map_batches()</a></code> function for
custom operations on dataset has been restored.</li>
@@ -315,7 +406,7 @@
<code><a href="https://rdrr.io/r/utils/head.html"
class="external-link">head()</a></code> no longer hangs on large CSV
datasets.</li>
<li>There is an improved error message when there is a conflict between a
header in the file and schema/column names provided as arguments.</li>
<li>
-<code><a href="../reference/write_csv_arrow.html">write_csv_arrow()</a></code>
now follows the signature of <code>readr::write_csv()</code>.</li>
+<code><a href="../reference/write_csv_arrow.html">write_csv_arrow()</a></code>
now follows the signature of <code><a
href="https://readr.tidyverse.org/reference/write_delim.html"
class="external-link">readr::write_csv()</a></code>.</li>
</ul></div>
<div class="section level3">
<h3 id="other-improvements-and-fixes-7-0-0">Other improvements and fixes<a
class="anchor" aria-label="anchor"
href="#other-improvements-and-fixes-7-0-0"></a></h3>
@@ -352,7 +443,7 @@
<h2 class="pkg-version" data-toc-text="6.0.1" id="arrow-601">arrow 6.0.1<a
class="anchor" aria-label="anchor" href="#arrow-601"></a></h2><p
class="text-muted">CRAN release: 2021-11-20</p>
<ul><li>Joins now support inclusion of dictionary columns, and multiple
crashes have been fixed</li>
<li>Grouped aggregation no longer crashes when working on data that has been
filtered down to 0 rows</li>
-<li>Bindings added for <code><a
href="https://stringr.tidyverse.org/reference/str_count.html"
class="external-link">str_count()</a></code> in dplyr queries</li>
+<li>Bindings added for <code>str_count()</code> in dplyr queries</li>
<li>Work around a critical bug in the AWS SDK for C++ that could affect S3
multipart upload</li>
<li>A UBSAN warning in the round kernel has been resolved</li>
<li>Fixes for build failures on Solaris and on old versions of macOS</li>
@@ -361,27 +452,27 @@
<h2 class="pkg-version" data-toc-text="6.0.0" id="arrow-600">arrow 6.0.0<a
class="anchor" aria-label="anchor" href="#arrow-600"></a></h2>
<p>There are now two ways to query Arrow data:</p>
<div class="section level3">
-<h3 id="1-expanded-arrow-native-queries-aggregation-and-joins-6-0-0">1.
Expanded Arrow-native queries: aggregation and joins<a class="anchor"
aria-label="anchor"
href="#1-expanded-arrow-native-queries-aggregation-and-joins-6-0-0"></a></h3>
-<p><code><a href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">dplyr::summarize()</a></code>, both grouped and
ungrouped, is now implemented for Arrow Datasets, Tables, and RecordBatches.
Because data is scanned in chunks, you can aggregate over larger-than-memory
datasets backed by many files. Supported aggregation functions include <code><a
href="https://dplyr.tidyverse.org/reference/context.html"
class="external-link">n()</a></code>, <code><a href="https [...]
-<p>Along with <code><a
href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">summarize()</a></code>, you can also call <code><a
href="https://dplyr.tidyverse.org/reference/count.html"
class="external-link">count()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/count.html"
class="external-link">tally()</a></code>, and <code><a
href="https://dplyr.tidyverse.org/reference/distinct.html"
class="external-link">distinct()</a></code>, which effectiv [...]
-<p>This enhancement does change the behavior of <code><a
href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">summarize()</a></code> and <code><a
href="https://dplyr.tidyverse.org/reference/compute.html"
class="external-link">collect()</a></code> in some cases: see “Breaking
changes” below for details.</p>
-<p>In addition to <code><a
href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">summarize()</a></code>, mutating and filtering equality
joins (<code><a href="https://dplyr.tidyverse.org/reference/mutate-joins.html"
class="external-link">inner_join()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/mutate-joins.html"
class="external-link">left_join()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/mutate-joins.html" class="exte [...]
+<h3 id="id_1-expanded-arrow-native-queries-aggregation-and-joins-6-0-0">1.
Expanded Arrow-native queries: aggregation and joins<a class="anchor"
aria-label="anchor"
href="#id_1-expanded-arrow-native-queries-aggregation-and-joins-6-0-0"></a></h3>
+<p><code><a href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">dplyr::summarize()</a></code>, both grouped and
ungrouped, is now implemented for Arrow Datasets, Tables, and RecordBatches.
Because data is scanned in chunks, you can aggregate over larger-than-memory
datasets backed by many files. Supported aggregation functions include
<code>n()</code>, <code>n_distinct()</code>, <code>min(),</code> <code><a
href="https://rdrr.io/r/base/Extremes.html" class=" [...]
+<p>Along with <code>summarize()</code>, you can also call
<code>count()</code>, <code>tally()</code>, and <code>distinct()</code>, which
effectively wrap <code>summarize()</code>.</p>
+<p>This enhancement does change the behavior of <code>summarize()</code> and
<code>collect()</code> in some cases: see “Breaking changes” below for
details.</p>
+<p>In addition to <code>summarize()</code>, mutating and filtering equality
joins (<code>inner_join()</code>, <code>left_join()</code>,
<code>right_join()</code>, <code>full_join()</code>, <code>semi_join()</code>,
and <code>anti_join()</code>) with are also supported natively in Arrow.</p>
<p>Grouped aggregation and (especially) joins should be considered somewhat
experimental in this release. We expect them to work, but they may not be well
optimized for all workloads. To help us focus our efforts on improving them in
the next release, please let us know if you encounter unexpected behavior or
poor performance.</p>
-<p>New non-aggregating compute functions include string functions like
<code><a href="https://stringr.tidyverse.org/reference/case.html"
class="external-link">str_to_title()</a></code> and <code><a
href="https://rdrr.io/r/base/strptime.html"
class="external-link">strftime()</a></code> as well as compute functions for
extracting date parts (e.g. <code>year()</code>, <code>month()</code>) from
dates. This is not a complete list of additional compute functions; for an
exhaustive list of ava [...]
+<p>New non-aggregating compute functions include string functions like
<code>str_to_title()</code> and <code><a
href="https://rdrr.io/r/base/strptime.html"
class="external-link">strftime()</a></code> as well as compute functions for
extracting date parts (e.g. <code>year()</code>, <code>month()</code>) from
dates. This is not a complete list of additional compute functions; for an
exhaustive list of available compute functions see <code><a
href="../reference/list_compute_functions.html"> [...]
<p>We’ve also worked to fill in support for all data types, such as
<code>Decimal</code>, for functions added in previous releases. All type
limitations mentioned in previous release notes should be no longer valid, and
if you find a function that is not implemented for a certain data type, please
<a href="https://issues.apache.org/jira/projects/ARROW/issues"
class="external-link">report an issue</a>.</p>
</div>
<div class="section level3">
-<h3 id="2-duckdb-integration-6-0-0">2. DuckDB integration<a class="anchor"
aria-label="anchor" href="#2-duckdb-integration-6-0-0"></a></h3>
+<h3 id="id_2-duckdb-integration-6-0-0">2. DuckDB integration<a class="anchor"
aria-label="anchor" href="#id_2-duckdb-integration-6-0-0"></a></h3>
<p>If you have the <a href="https://CRAN.R-project.org/package=duckdb"
class="external-link">duckdb package</a> installed, you can hand off an Arrow
Dataset or query object to <a href="https://duckdb.org/"
class="external-link">DuckDB</a> for further querying using the <code><a
href="../reference/to_duckdb.html">to_duckdb()</a></code> function. This allows
you to use duckdb’s <code>dbplyr</code> methods, as well as its SQL interface,
to aggregate data. Filtering and column projection don [...]
<p>You can also take a duckdb <code>tbl</code> and call <code><a
href="../reference/to_arrow.html">to_arrow()</a></code> to stream data to
Arrow’s query engine. This means that in a single dplyr pipeline, you could
start with an Arrow Dataset, evaluate some steps in DuckDB, then evaluate the
rest in Arrow.</p>
</div>
<div class="section level3">
<h3 id="breaking-changes-6-0-0">Breaking changes<a class="anchor"
aria-label="anchor" href="#breaking-changes-6-0-0"></a></h3>
-<ul><li>Row order of data from a Dataset query is no longer deterministic. If
you need a stable sort order, you should explicitly <code><a
href="https://dplyr.tidyverse.org/reference/arrange.html"
class="external-link">arrange()</a></code> the query result. For calls to
<code><a href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">summarize()</a></code>, you can set
<code>options(arrow.summarise.sort = TRUE)</code> to match the current
<code>dplyr</code> beha [...]
+<ul><li>Row order of data from a Dataset query is no longer deterministic. If
you need a stable sort order, you should explicitly <code>arrange()</code> the
query result. For calls to <code>summarize()</code>, you can set
<code>options(arrow.summarise.sort = TRUE)</code> to match the current
<code>dplyr</code> behavior of sorting on the grouping columns.</li>
<li>
-<code><a href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">dplyr::summarize()</a></code> on an in-memory Arrow Table
or RecordBatch no longer eagerly evaluates. Call <code><a
href="https://dplyr.tidyverse.org/reference/compute.html"
class="external-link">compute()</a></code> or <code><a
href="https://dplyr.tidyverse.org/reference/compute.html"
class="external-link">collect()</a></code> to evaluate the query.</li>
+<code><a href="https://dplyr.tidyverse.org/reference/summarise.html"
class="external-link">dplyr::summarize()</a></code> on an in-memory Arrow Table
or RecordBatch no longer eagerly evaluates. Call <code>compute()</code> or
<code>collect()</code> to evaluate the query.</li>
<li>
-<code><a href="https://rdrr.io/r/utils/head.html"
class="external-link">head()</a></code> and <code><a
href="https://rdrr.io/r/utils/head.html"
class="external-link">tail()</a></code> also no longer eagerly evaluate, both
for in-memory data and for Datasets. Also, because row order is no longer
deterministic, they will effectively give you a random slice of data from
somewhere in the dataset unless you <code><a
href="https://dplyr.tidyverse.org/reference/arrange.html" class="external-lin
[...]
+<code><a href="https://rdrr.io/r/utils/head.html"
class="external-link">head()</a></code> and <code><a
href="https://rdrr.io/r/utils/head.html"
class="external-link">tail()</a></code> also no longer eagerly evaluate, both
for in-memory data and for Datasets. Also, because row order is no longer
deterministic, they will effectively give you a random slice of data from
somewhere in the dataset unless you <code>arrange()</code> to specify
sorting.</li>
<li>Simple Feature (SF) columns no longer save all of their metadata when
converting to Arrow tables (and thus when saving to Parquet or Feather). This
also includes any dataframe column that has attributes on each element (in
other words: row-level metadata). Our previous approach to saving this metadata
is both (computationally) inefficient and unreliable with Arrow queries +
datasets. This will most impact saving SF columns. For saving these columns we
recommend either converting the [...]
<li>Datasets are officially no longer supported on 32-bit Windows on R <
4.0 (Rtools 3.5). 32-bit Windows users should upgrade to a newer version of R
in order to use datasets.</li>
</ul></div>
@@ -389,7 +480,7 @@
<h3 id="installation-on-linux-6-0-0">Installation on Linux<a class="anchor"
aria-label="anchor" href="#installation-on-linux-6-0-0"></a></h3>
<ul><li>Package installation now fails if the Arrow C++ library does not
compile. In previous versions, if the C++ library failed to compile, you would
get a successful R package installation that wouldn’t do much useful.</li>
<li>You can disable all optional C++ components when building from source by
setting the environment variable <code>LIBARROW_MINIMAL=true</code>. This will
have the core Arrow/Feather components but excludes Parquet, Datasets,
compression libraries, and other optional features.</li>
-<li>Source packages now bundle the Arrow C++ source code, so it does not have
to be downloaded in order to build the package. Because the source is included,
it is now possible to build the package on an offline/airgapped system. By
default, the offline build will be minimal because it cannot download
third-party C++ dependencies required to support all features. To allow a fully
featured offline build, the included <code><a
href="../reference/create_package_with_all_dependencies.html">c [...]
+<li>Source packages now bundle the Arrow C++ source code, so it does not have
to be downloaded in order to build the package. Because the source is included,
it is now possible to build the package on an offline/airgapped system. By
default, the offline build will be minimal because it cannot download
third-party C++ dependencies required to support all features. To allow a fully
featured offline build, the included <code><a
href="../reference/create_package_with_all_dependencies.html">c [...]
<li>Source builds can make use of system dependencies (such as
<code>libz</code>) by setting <code>ARROW_DEPENDENCY_SOURCE=AUTO</code>. This
is not the default in this release (<code>BUNDLED</code>, i.e. download and
build all dependencies) but may become the default in the future.</li>
<li>The JSON library components (<code><a
href="../reference/read_json_arrow.html">read_json_arrow()</a></code>) are now
optional and still on by default; set <code>ARROW_JSON=OFF</code> before
building to disable them.</li>
</ul></div>
@@ -403,7 +494,7 @@
<li>
<code><a href="../reference/write_parquet.html">write_parquet()</a></code> no
longer errors when used with a grouped data.frame</li>
<li>
-<code><a href="https://dplyr.tidyverse.org/reference/case_when.html"
class="external-link">case_when()</a></code> now errors cleanly if an
expression is not supported in Arrow</li>
+<code>case_when()</code> now errors cleanly if an expression is not supported
in Arrow</li>
<li>
<code><a href="../reference/open_dataset.html">open_dataset()</a></code> now
works on CSVs without header rows</li>
<li>Fixed a minor issue where the short readr-style types <code>T</code> and
<code>t</code> were reversed in <code><a
href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code>
@@ -431,19 +522,19 @@
<div class="section level3">
<h3 id="more-dplyr-5-0-0">More dplyr<a class="anchor" aria-label="anchor"
href="#more-dplyr-5-0-0"></a></h3>
<ul><li>
-<p>There are now more than 250 compute functions available for use in <code><a
href="https://dplyr.tidyverse.org/reference/filter.html"
class="external-link">dplyr::filter()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/mutate.html"
class="external-link">mutate()</a></code>, etc. Additions in this release
include:</p>
-<ul><li>String operations: <code><a
href="https://rdrr.io/r/base/strsplit.html"
class="external-link">strsplit()</a></code> and <code><a
href="https://stringr.tidyverse.org/reference/str_split.html"
class="external-link">str_split()</a></code>; <code><a
href="https://rdrr.io/r/base/strptime.html"
class="external-link">strptime()</a></code>; <code><a
href="https://rdrr.io/r/base/paste.html"
class="external-link">paste()</a></code>, <code><a
href="https://rdrr.io/r/base/paste.html" class=" [...]
+<p>There are now more than 250 compute functions available for use in <code><a
href="https://dplyr.tidyverse.org/reference/filter.html"
class="external-link">dplyr::filter()</a></code>, <code>mutate()</code>, etc.
Additions in this release include:</p>
+<ul><li>String operations: <code><a
href="https://rdrr.io/r/base/strsplit.html"
class="external-link">strsplit()</a></code> and <code>str_split()</code>;
<code><a href="https://rdrr.io/r/base/strptime.html"
class="external-link">strptime()</a></code>; <code><a
href="https://rdrr.io/r/base/paste.html"
class="external-link">paste()</a></code>, <code><a
href="https://rdrr.io/r/base/paste.html"
class="external-link">paste0()</a></code>, and <code>str_c()</code>; <code><a
href="https://rdrr.i [...]
</li>
<li>Date/time operations: <code>lubridate</code> methods such as
<code>year()</code>, <code>month()</code>, <code>wday()</code>, and so on</li>
<li>Math: logarithms (<code><a href="https://rdrr.io/r/base/Log.html"
class="external-link">log()</a></code> et al.); trigonometry (<code><a
href="https://rdrr.io/r/base/Trig.html" class="external-link">sin()</a></code>,
<code><a href="https://rdrr.io/r/base/Trig.html"
class="external-link">cos()</a></code>, et al.); <code><a
href="https://rdrr.io/r/base/MathFun.html"
class="external-link">abs()</a></code>; <code><a
href="https://rdrr.io/r/base/sign.html" class="external-link">sign()</a> [...]
</li>
-<li>Conditional functions, with some limitations on input type in this
release: <code><a href="https://rdrr.io/r/base/ifelse.html"
class="external-link">ifelse()</a></code> and <code><a
href="https://dplyr.tidyverse.org/reference/if_else.html"
class="external-link">if_else()</a></code> for all but <code>Decimal</code>
types; <code><a href="https://dplyr.tidyverse.org/reference/case_when.html"
class="external-link">case_when()</a></code> for logical, numeric, and temporal
types only; <cod [...]
+<li>Conditional functions, with some limitations on input type in this
release: <code><a href="https://rdrr.io/r/base/ifelse.html"
class="external-link">ifelse()</a></code> and <code>if_else()</code> for all
but <code>Decimal</code> types; <code>case_when()</code> for logical, numeric,
and temporal types only; <code>coalesce()</code> for all but lists/structs.
Note also that in this release, factors/dictionaries are converted to strings
in these functions.</li>
<li>
-<code>is.*</code> functions are supported and can be used inside <code><a
href="https://dplyr.tidyverse.org/reference/relocate.html"
class="external-link">relocate()</a></code>
+<code>is.*</code> functions are supported and can be used inside
<code>relocate()</code>
</li>
</ul></li>
-<li>The print method for <code>arrow_dplyr_query</code> now includes the
expression and the resulting type of columns derived by <code><a
href="https://dplyr.tidyverse.org/reference/mutate.html"
class="external-link">mutate()</a></code>.</li>
-<li><p><code><a href="https://dplyr.tidyverse.org/reference/mutate.html"
class="external-link">transmute()</a></code> now errors if passed arguments
<code>.keep</code>, <code>.before</code>, or <code>.after</code>, for
consistency with the behavior of <code>dplyr</code> on
<code>data.frame</code>s.</p></li>
+<li><p>The print method for <code>arrow_dplyr_query</code> now includes the
expression and the resulting type of columns derived by
<code>mutate()</code>.</p></li>
+<li><p><code>transmute()</code> now errors if passed arguments
<code>.keep</code>, <code>.before</code>, or <code>.after</code>, for
consistency with the behavior of <code>dplyr</code> on
<code>data.frame</code>s.</p></li>
</ul></div>
<div class="section level3">
<h3 id="csv-writing-5-0-0">CSV writing<a class="anchor" aria-label="anchor"
href="#csv-writing-5-0-0"></a></h3>
@@ -454,8 +545,8 @@
</ul></div>
<div class="section level3">
<h3 id="c-interface-5-0-0">C interface<a class="anchor" aria-label="anchor"
href="#c-interface-5-0-0"></a></h3>
-<ul><li>Added bindings for the remainder of C data interface: Type, Field, and
RecordBatchReader (from the experimental C stream interface). These also have
<code><a
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"
class="external-link">reticulate::py_to_r()</a></code> and <code><a
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"
class="external-link">r_to_py()</a></code> methods. Along with the addition of
the <code>Scanner$ToRecordBat [...]
-<li>C interface methods are exposed on Arrow objects (e.g.
<code>Array$export_to_c()</code>, <code>RecordBatch$import_from_c()</code>),
similar to how they are in <code>pyarrow</code>. This facilitates their use in
other packages. See the <code><a
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"
class="external-link">py_to_r()</a></code> and <code><a
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"
class="external-link">r_to_py()</a></c [...]
+<ul><li>Added bindings for the remainder of C data interface: Type, Field, and
RecordBatchReader (from the experimental C stream interface). These also have
<code><a
href="https://rstudio.github.io/reticulate/reference/r-py-conversion.html"
class="external-link">reticulate::py_to_r()</a></code> and
<code>r_to_py()</code> methods. Along with the addition of the
<code>Scanner$ToRecordBatchReader()</code> method, you can now build up a
Dataset query in R and pass the resulting stream of bat [...]
+<li>C interface methods are exposed on Arrow objects (e.g.
<code>Array$export_to_c()</code>, <code>RecordBatch$import_from_c()</code>),
similar to how they are in <code>pyarrow</code>. This facilitates their use in
other packages. See the <code>py_to_r()</code> and <code>r_to_py()</code>
methods for usage examples.</li>
</ul></div>
<div class="section level3">
<h3 id="other-enhancements-5-0-0">Other enhancements<a class="anchor"
aria-label="anchor" href="#other-enhancements-5-0-0"></a></h3>
@@ -478,7 +569,8 @@
</div>
<div class="section level2">
<h2 class="pkg-version" data-toc-text="4.0.1" id="arrow-401">arrow 4.0.1<a
class="anchor" aria-label="anchor" href="#arrow-401"></a></h2><p
class="text-muted">CRAN release: 2021-05-28</p>
-<ul><li>Resolved a few bugs in new string compute kernels (<a
href="https://issues.apache.org/jira/browse/ARROW-12774"
class="external-link">ARROW-12774</a>, <a
href="https://issues.apache.org/jira/browse/ARROW-12670"
class="external-link">ARROW-12670</a>)</li></ul></div>
+<ul><li>Resolved a few bugs in new string compute kernels (<a
href="https://github.com/apache/arrow/issues/10320"
class="external-link">#10320</a>, <a
href="https://github.com/apache/arrow/issues/10287"
class="external-link">#10287</a>)</li>
+</ul></div>
<div class="section level2">
<h2 class="pkg-version" data-toc-text="4.0.0.1" id="arrow-4001">arrow
4.0.0.1<a class="anchor" aria-label="anchor" href="#arrow-4001"></a></h2><p
class="text-muted">CRAN release: 2021-05-10</p>
<ul><li>The mimalloc memory allocator is the default memory allocator when
using a static source build of the package on Linux. This is because it has
better behavior under valgrind than jemalloc does. A full-featured build
(installed with <code>LIBARROW_MINIMAL=false</code>) includes both jemalloc and
mimalloc, and it has still has jemalloc as default, though this is configurable
at runtime with the <code>ARROW_DEFAULT_MEMORY_POOL</code> environment
variable.</li>
@@ -491,9 +583,9 @@
<h3 id="dplyr-methods-4-0-0">dplyr methods<a class="anchor"
aria-label="anchor" href="#dplyr-methods-4-0-0"></a></h3>
<p>Many more <code>dplyr</code> verbs are supported on Arrow objects:</p>
<ul><li>
-<code><a href="https://dplyr.tidyverse.org/reference/mutate.html"
class="external-link">dplyr::mutate()</a></code> is now supported in Arrow for
many applications. For queries on <code>Table</code> and
<code>RecordBatch</code> that are not yet supported in Arrow, the
implementation falls back to pulling data into an in-memory R
<code>data.frame</code> first, as in the previous release. For queries on
<code>Dataset</code> (which can be larger than memory), it raises an error if
the functi [...]
+<code><a href="https://dplyr.tidyverse.org/reference/mutate.html"
class="external-link">dplyr::mutate()</a></code> is now supported in Arrow for
many applications. For queries on <code>Table</code> and
<code>RecordBatch</code> that are not yet supported in Arrow, the
implementation falls back to pulling data into an in-memory R
<code>data.frame</code> first, as in the previous release. For queries on
<code>Dataset</code> (which can be larger than memory), it raises an error if
the functi [...]
<li>
-<code><a href="https://dplyr.tidyverse.org/reference/mutate.html"
class="external-link">dplyr::transmute()</a></code> (which calls <code><a
href="https://dplyr.tidyverse.org/reference/mutate.html"
class="external-link">mutate()</a></code>)</li>
+<code><a href="https://dplyr.tidyverse.org/reference/transmute.html"
class="external-link">dplyr::transmute()</a></code> (which calls
<code>mutate()</code>)</li>
<li>
<code><a href="https://dplyr.tidyverse.org/reference/group_by.html"
class="external-link">dplyr::group_by()</a></code> now preserves the
<code>.drop</code> argument and supports on-the-fly definition of columns</li>
<li>
@@ -503,8 +595,8 @@
<li>
<code><a href="https://dplyr.tidyverse.org/reference/compute.html"
class="external-link">dplyr::compute()</a></code> to evaluate the lazy
expressions and return an Arrow Table. This is equivalent to
<code>dplyr::collect(as_data_frame = FALSE)</code>, which was added in
2.0.0.</li>
</ul><p>Over 100 functions can now be called on Arrow objects inside a
<code>dplyr</code> verb:</p>
-<ul><li>String functions <code><a href="https://rdrr.io/r/base/nchar.html"
class="external-link">nchar()</a></code>, <code><a
href="https://rdrr.io/r/base/chartr.html"
class="external-link">tolower()</a></code>, and <code><a
href="https://rdrr.io/r/base/chartr.html"
class="external-link">toupper()</a></code>, along with their
<code>stringr</code> spellings <code><a
href="https://stringr.tidyverse.org/reference/str_length.html"
class="external-link">str_length()</a></code>, <code><a href= [...]
-<li>Regular expression functions <code><a
href="https://rdrr.io/r/base/grep.html" class="external-link">sub()</a></code>,
<code><a href="https://rdrr.io/r/base/grep.html"
class="external-link">gsub()</a></code>, and <code><a
href="https://rdrr.io/r/base/grep.html"
class="external-link">grepl()</a></code>, along with <code><a
href="https://stringr.tidyverse.org/reference/str_replace.html"
class="external-link">str_replace()</a></code>, <code><a
href="https://stringr.tidyverse.org/referenc [...]
+<ul><li>String functions <code><a href="https://rdrr.io/r/base/nchar.html"
class="external-link">nchar()</a></code>, <code><a
href="https://rdrr.io/r/base/chartr.html"
class="external-link">tolower()</a></code>, and <code><a
href="https://rdrr.io/r/base/chartr.html"
class="external-link">toupper()</a></code>, along with their
<code>stringr</code> spellings <code>str_length()</code>,
<code>str_to_lower()</code>, and <code>str_to_upper()</code>, are supported in
Arrow <code>dplyr</code> ca [...]
+<li>Regular expression functions <code><a
href="https://rdrr.io/r/base/grep.html" class="external-link">sub()</a></code>,
<code><a href="https://rdrr.io/r/base/grep.html"
class="external-link">gsub()</a></code>, and <code><a
href="https://rdrr.io/r/base/grep.html"
class="external-link">grepl()</a></code>, along with
<code>str_replace()</code>, <code>str_replace_all()</code>, and
<code>str_detect()</code>, are supported.</li>
<li>
<code>cast(x, type)</code> and <code>dictionary_encode()</code> allow changing
the type of columns in Arrow objects; <code><a
href="https://rdrr.io/r/base/numeric.html"
class="external-link">as.numeric()</a></code>, <code><a
href="https://rdrr.io/r/base/character.html"
class="external-link">as.character()</a></code>, etc. are exposed as similar
type-altering conveniences</li>
<li>
@@ -534,7 +626,7 @@
</li>
<li>Similarly, <code>Schema</code> can now be edited by assigning in new
types. This enables using the CSV reader to detect the schema of a file, modify
the <code>Schema</code> object for any columns that you want to read in as a
different type, and then use that <code>Schema</code> to read the data.</li>
<li>Better validation when creating a <code>Table</code> with a schema, with
columns of different lengths, and with scalar value recycling</li>
-<li>Reading Parquet files in Japanese or other multi-byte locales on Windows
no longer hangs (workaround for a <a
href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98723"
class="external-link">bug in libstdc++</a>; thanks @yutannihilation for the
persistence in discovering this!)</li>
+<li>Reading Parquet files in Japanese or other multi-byte locales on Windows
no longer hangs (workaround for a <a
href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98723"
class="external-link">bug in libstdc++</a>; thanks <a
href="https://github.com/yutannihilation"
class="external-link">@yutannihilation</a> for the persistence in discovering
this!)</li>
<li>If you attempt to read string data that has embedded nul (<code>\0</code>)
characters, the error message now informs you that you can set
<code>options(arrow.skip_nul = TRUE)</code> to strip them out. It is not
recommended to set this option by default since this code path is significantly
slower, and most string data does not contain nuls.</li>
<li>
<code><a href="../reference/read_json_arrow.html">read_json_arrow()</a></code>
now accepts a schema: <code>read_json_arrow("file.json", schema = schema(col_a
= float64(), col_b = string()))</code>
@@ -545,7 +637,7 @@
<ul><li>The R package can now support working with an Arrow C++ library that
has additional features (such as dataset, parquet, string libraries) disabled,
and the bundled build script enables setting environment variables to disable
them. See <code><a href="../articles/install.html">vignette("install", package
= "arrow")</a></code> for details. This allows a faster, smaller package build
in cases where that is useful, and it enables a minimal, functioning R package
build on Solaris.</li>
<li>On macOS, it is now possible to use the same bundled C++ build that is
used by default on Linux, along with all of its customization parameters, by
setting the environment variable <code>FORCE_BUNDLED_BUILD=true</code>.</li>
<li>
-<code>arrow</code> now uses the <code>mimalloc</code> memory allocator by
default on macOS, if available (as it is in CRAN binaries), instead of
<code>jemalloc</code>. There are <a
href="https://issues.apache.org/jira/browse/<a%20href='https://issues.apache.org/jira/browse/ARROW-6994'>ARROW-6994</a>"
class="external-link">configuration issues</a> with <code>jemalloc</code> on
macOS, and <a href="https://ursalabs.org/blog/2021-r-benchmarks-part-1/"
class="external-link">benchm [...]
+<code>arrow</code> now uses the <code>mimalloc</code> memory allocator by
default on macOS, if available (as it is in CRAN binaries), instead of
<code>jemalloc</code>. There are <a
href="https://github.com/apache/arrow/issues/23308"
class="external-link">configuration issues</a> with <code>jemalloc</code> on
macOS, and <a href="https://ursalabs.org/blog/2021-r-benchmarks-part-1/"
class="external-link">benchmark analysis</a> shows that this has negative
effects on performance, especially [...]
<li>Setting the <code>ARROW_DEFAULT_MEMORY_POOL</code> environment variable to
switch memory allocators now works correctly when the Arrow C++ library has
been statically linked (as is usually the case when installing from CRAN).</li>
<li>The <code><a href="../reference/arrow_info.html">arrow_info()</a></code>
function now reports on the additional optional features, as well as the
detected SIMD level. If key features or compression libraries are not enabled
in the build, <code><a
href="../reference/arrow_info.html">arrow_info()</a></code> will refer to the
installation vignette for guidance on how to install a more complete build, if
desired.</li>
<li>If you attempt to read a file that was compressed with a codec that your
Arrow build does not contain support for, the error message now will tell you
how to reinstall Arrow with that feature enabled.</li>
@@ -579,7 +671,7 @@
<li>
<code><a href="../reference/arrow_info.html">arrow_info()</a></code> for an
overview of various run-time and build-time Arrow configurations, useful for
debugging</li>
<li>Set environment variable <code>ARROW_DEFAULT_MEMORY_POOL</code> before
loading the Arrow package to change memory allocators. Windows packages are
built with <code>mimalloc</code>; most others are built with both
<code>jemalloc</code> (used by default) and <code>mimalloc</code>. These
alternative memory allocators are generally much faster than the system memory
allocator, so they are used by default when available, but sometimes it is
useful to turn them off for debugging purposes. [...]
-<li>List columns that have attributes on each element are now also included
with the metadata that is saved when creating Arrow tables. This allows
<code>sf</code> tibbles to faithfully preserved and roundtripped (<a
href="https://issues.apache.org/jira/browse/ARROW-10386"
class="external-link">ARROW-10386</a>).</li>
+<li>List columns that have attributes on each element are now also included
with the metadata that is saved when creating Arrow tables. This allows
<code>sf</code> tibbles to faithfully preserved and roundtripped (<a
href="https://github.com/apache/arrow/issues/8549"
class="external-link">#8549</a>).</li>
<li>R metadata that exceeds 100Kb is now compressed before being written to a
table; see <code><a href="../reference/Schema.html">schema()</a></code> for
more details.</li>
</ul></div>
<div class="section level3">
@@ -590,8 +682,8 @@
<code><a href="../reference/write_parquet.html">write_parquet()</a></code> can
now write RecordBatches</li>
<li>Reading a Table from a RecordBatchStreamReader containing 0 batches no
longer crashes</li>
<li>
-<code>readr</code>’s <code>problems</code> attribute is removed when
converting to Arrow RecordBatch and table to prevent large amounts of metadata
from accumulating inadvertently (<a
href="https://issues.apache.org/jira/browse/ARROW-10624"
class="external-link">ARROW-10624</a>)</li>
-<li>Fixed reading of compressed Feather files written with Arrow 0.17 (<a
href="https://issues.apache.org/jira/browse/ARROW-10850"
class="external-link">ARROW-10850</a>)</li>
+<code>readr</code>’s <code>problems</code> attribute is removed when
converting to Arrow RecordBatch and table to prevent large amounts of metadata
from accumulating inadvertently (<a
href="https://github.com/apache/arrow/issues/9092"
class="external-link">#9092</a>)</li>
+<li>Fixed reading of compressed Feather files written with Arrow 0.17 (<a
href="https://github.com/apache/arrow/issues/9128"
class="external-link">#9128</a>)</li>
<li>
<code>SubTreeFileSystem</code> gains a useful print method and no longer
errors when printing</li>
</ul></div>
@@ -613,7 +705,7 @@
<code><a href="../reference/write_dataset.html">write_dataset()</a></code> to
Feather or Parquet files with partitioning. See the end of <code><a
href="../articles/dataset.html">vignette("dataset", package =
"arrow")</a></code> for discussion and examples.</li>
<li>Datasets now have <code><a href="https://rdrr.io/r/utils/head.html"
class="external-link">head()</a></code>, <code><a
href="https://rdrr.io/r/utils/head.html"
class="external-link">tail()</a></code>, and take (<code>[</code>) methods.
<code><a href="https://rdrr.io/r/utils/head.html"
class="external-link">head()</a></code> is optimized but the others may not be
performant.</li>
<li>
-<code><a href="https://dplyr.tidyverse.org/reference/compute.html"
class="external-link">collect()</a></code> gains an <code>as_data_frame</code>
argument, default <code>TRUE</code> but when <code>FALSE</code> allows you to
evaluate the accumulated <code>select</code> and <code>filter</code> query but
keep the result in Arrow, not an R <code>data.frame</code>
+<code>collect()</code> gains an <code>as_data_frame</code> argument, default
<code>TRUE</code> but when <code>FALSE</code> allows you to evaluate the
accumulated <code>select</code> and <code>filter</code> query but keep the
result in Arrow, not an R <code>data.frame</code>
</li>
<li>
<code><a href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code>
supports specifying column types, both with a <code>Schema</code> and with the
compact string representation for types used in the <code>readr</code> package.
It also has gained a <code>timestamp_parsers</code> argument that lets you
express a set of <code>strptime</code> parse strings that will be tried to
convert columns designated as <code>Timestamp</code> type.</li>
@@ -680,7 +772,7 @@
<code>character</code> vectors that exceed 2GB are converted to Arrow
<code>large_utf8</code> type</li>
<li>
<code>POSIXlt</code> objects can now be converted to Arrow
(<code>struct</code>)</li>
-<li>R <code><a href="https://rdrr.io/r/base/attributes.html"
class="external-link">attributes()</a></code> are preserved in Arrow metadata
when converting to Arrow RecordBatch and table and are restored when converting
from Arrow. This means that custom subclasses, such as
<code>haven::labelled</code>, are preserved in round trip through Arrow.</li>
+<li>R <code><a href="https://rdrr.io/r/base/attributes.html"
class="external-link">attributes()</a></code> are preserved in Arrow metadata
when converting to Arrow RecordBatch and table and are restored when converting
from Arrow. This means that custom subclasses, such as <code><a
href="https://haven.tidyverse.org/reference/labelled.html"
class="external-link">haven::labelled</a></code>, are preserved in round trip
through Arrow.</li>
<li>Schema metadata is now exposed as a named list, and it can be modified by
assignment like <code>batch$metadata$new_key <- "new value"</code>
</li>
<li>Arrow types <code>int64</code>, <code>uint32</code>, and
<code>uint64</code> now are converted to R <code>integer</code> if all values
fit in bounds</li>
@@ -746,7 +838,7 @@
<div class="section level3">
<h3 id="datasets-0-17-0">Datasets<a class="anchor" aria-label="anchor"
href="#datasets-0-17-0"></a></h3>
<ul><li>Dataset reading benefits from many speedups and fixes in the C++
library</li>
-<li>Datasets have a <code><a href="https://rdrr.io/r/base/dim.html"
class="external-link">dim()</a></code> method, which sums rows across all files
(<a href="https://issues.apache.org/jira/browse/ARROW-8118"
class="external-link">ARROW-8118</a>, @boshek)</li>
+<li>Datasets have a <code><a href="https://rdrr.io/r/base/dim.html"
class="external-link">dim()</a></code> method, which sums rows across all files
(<a href="https://github.com/apache/arrow/issues/6635"
class="external-link">#6635</a>, <a href="https://github.com/boshek"
class="external-link">@boshek</a>)</li>
<li>Combine multiple datasets into a single queryable
<code>UnionDataset</code> with the <code><a
href="https://rdrr.io/r/base/c.html" class="external-link">c()</a></code>
method</li>
<li>Dataset filtering now treats <code>NA</code> as <code>FALSE</code>,
consistent with <code><a
href="https://dplyr.tidyverse.org/reference/filter.html"
class="external-link">dplyr::filter()</a></code>
</li>
@@ -777,14 +869,14 @@
<code><a href="../reference/install_arrow.html">install_arrow()</a></code> now
installs the latest release of <code>arrow</code>, including Linux
dependencies, either for CRAN releases or for development builds (if
<code>nightly = TRUE</code>)</li>
<li>Package installation on Linux no longer downloads C++ dependencies unless
the <code>LIBARROW_DOWNLOAD</code> or <code>NOT_CRAN</code> environment
variable is set</li>
<li>
-<code><a href="../reference/write_feather.html">write_feather()</a></code>,
<code>write_arrow()</code> and <code><a
href="../reference/write_parquet.html">write_parquet()</a></code> now return
their input, similar to the <code>write_*</code> functions in the
<code>readr</code> package (<a
href="https://issues.apache.org/jira/browse/ARROW-7796"
class="external-link">ARROW-7796</a>, @boshek)</li>
-<li>Can now infer the type of an R <code>list</code> and create a ListArray
when all list elements are the same type (<a
href="https://issues.apache.org/jira/browse/ARROW-7662"
class="external-link">ARROW-7662</a>, @michaelchirico)</li>
+<code><a href="../reference/write_feather.html">write_feather()</a></code>,
<code>write_arrow()</code> and <code><a
href="../reference/write_parquet.html">write_parquet()</a></code> now return
their input, similar to the <code>write_*</code> functions in the
<code>readr</code> package (<a
href="https://github.com/apache/arrow/issues/6387"
class="external-link">#6387</a>, <a href="https://github.com/boshek"
class="external-link">@boshek</a>)</li>
+<li>Can now infer the type of an R <code>list</code> and create a ListArray
when all list elements are the same type (<a
href="https://github.com/apache/arrow/issues/6275"
class="external-link">#6275</a>, <a href="https://github.com/michaelchirico"
class="external-link">@michaelchirico</a>)</li>
</ul></div>
<div class="section level2">
<h2 class="pkg-version" data-toc-text="0.16.0" id="arrow-0160">arrow 0.16.0<a
class="anchor" aria-label="anchor" href="#arrow-0160"></a></h2><p
class="text-muted">CRAN release: 2020-02-09</p>
<div class="section level3">
<h3 id="multi-file-datasets-0-16-0">Multi-file datasets<a class="anchor"
aria-label="anchor" href="#multi-file-datasets-0-16-0"></a></h3>
-<p>This release includes a <code>dplyr</code> interface to Arrow Datasets,
which let you work efficiently with large, multi-file datasets as a single
entity. Explore a directory of data files with <code><a
href="../reference/open_dataset.html">open_dataset()</a></code> and then use
<code>dplyr</code> methods to <code><a
href="https://dplyr.tidyverse.org/reference/select.html"
class="external-link">select()</a></code>, <code><a
href="https://dplyr.tidyverse.org/reference/filter.html" clas [...]
+<p>This release includes a <code>dplyr</code> interface to Arrow Datasets,
which let you work efficiently with large, multi-file datasets as a single
entity. Explore a directory of data files with <code><a
href="../reference/open_dataset.html">open_dataset()</a></code> and then use
<code>dplyr</code> methods to <code>select()</code>, <code><a
href="https://rdrr.io/r/stats/filter.html"
class="external-link">filter()</a></code>, etc. Work will be done where
possible in Arrow memory. When n [...]
<p>See <code><a href="../articles/dataset.html">vignette("dataset", package =
"arrow")</a></code> for details.</p>
</div>
<div class="section level3">
@@ -805,34 +897,35 @@
<code><a href="../reference/write_parquet.html">write_parquet()</a></code> now
supports compression</li>
<li>
<code><a
href="../reference/codec_is_available.html">codec_is_available()</a></code>
returns <code>TRUE</code> or <code>FALSE</code> whether the Arrow C++ library
was built with support for a given compression library (e.g. gzip, lz4,
snappy)</li>
-<li>Windows builds now include support for zstd and lz4 compression (<a
href="https://issues.apache.org/jira/browse/ARROW-6960"
class="external-link">ARROW-6960</a>, @gnguy)</li>
+<li>Windows builds now include support for zstd and lz4 compression (<a
href="https://github.com/apache/arrow/issues/5814"
class="external-link">#5814</a>, <a href="https://github.com/gnguy"
class="external-link">@gnguy</a>)</li>
</ul></div>
<div class="section level3">
<h3 id="other-fixes-and-improvements-0-16-0">Other fixes and improvements<a
class="anchor" aria-label="anchor"
href="#other-fixes-and-improvements-0-16-0"></a></h3>
<ul><li>Arrow null type is now supported</li>
-<li>Factor types are now preserved in round trip through Parquet format (<a
href="https://issues.apache.org/jira/browse/ARROW-7045"
class="external-link">ARROW-7045</a>, @yutannihilation)</li>
+<li>Factor types are now preserved in round trip through Parquet format (<a
href="https://github.com/apache/arrow/issues/6135"
class="external-link">#6135</a>, <a href="https://github.com/yutannihilation"
class="external-link">@yutannihilation</a>)</li>
<li>Reading an Arrow dictionary type coerces dictionary values to
<code>character</code> (as R <code>factor</code> levels are required to be)
instead of raising an error</li>
-<li>Many improvements to Parquet function documentation (@karldw,
@khughitt)</li>
+<li>Many improvements to Parquet function documentation (<a
href="https://github.com/karldw" class="external-link">@karldw</a>, <a
href="https://github.com/khughitt" class="external-link">@khughitt</a>)</li>
</ul></div>
</div>
<div class="section level2">
<h2 class="pkg-version" data-toc-text="0.15.1" id="arrow-0151">arrow 0.15.1<a
class="anchor" aria-label="anchor" href="#arrow-0151"></a></h2><p
class="text-muted">CRAN release: 2019-11-04</p>
-<ul><li>This patch release includes bugfixes in the C++ library around
dictionary types and Parquet reading.</li></ul></div>
+<ul><li>This patch release includes bugfixes in the C++ library around
dictionary types and Parquet reading.</li>
+</ul></div>
<div class="section level2">
<h2 class="pkg-version" data-toc-text="0.15.0" id="arrow-0150">arrow 0.15.0<a
class="anchor" aria-label="anchor" href="#arrow-0150"></a></h2><p
class="text-muted">CRAN release: 2019-10-07</p>
<div class="section level3">
<h3 id="breaking-changes-0-15-0">Breaking changes<a class="anchor"
aria-label="anchor" href="#breaking-changes-0-15-0"></a></h3>
<ul><li>The R6 classes that wrap the C++ classes are now documented and
exported and have been renamed to be more R-friendly. Users of the high-level R
interface in this package are not affected. Those who want to interact with the
Arrow C++ API more directly should work with these objects and methods. As part
of this change, many functions that instantiated these R6 objects have been
removed in favor of <code>Class$create()</code> methods. Notably, <code><a
href="../reference/array.html [...]
<li>Due to a subtle change in the Arrow message format, data written by the
0.15 version libraries may not be readable by older versions. If you need to
send data to a process that uses an older version of Arrow (for example, an
Apache Spark server that hasn’t yet updated to Arrow 0.15), you can set the
environment variable <code>ARROW_PRE_0_15_IPC_FORMAT=1</code>.</li>
-<li>The <code>as_tibble</code> argument in the <code>read_*()</code> functions
has been renamed to <code>as_data_frame</code> (<a
href="https://issues.apache.org/jira/browse/ARROW-6337"
class="external-link">ARROW-6337</a>, @jameslamb)</li>
+<li>The <code>as_tibble</code> argument in the <code>read_*()</code> functions
has been renamed to <code>as_data_frame</code> (<a
href="https://github.com/apache/arrow/issues/5399"
class="external-link">#5399</a>, <a href="https://github.com/jameslamb"
class="external-link">@jameslamb</a>)</li>
<li>The <code>arrow::Column</code> class has been removed, as it was removed
from the C++ library</li>
</ul></div>
<div class="section level3">
<h3 id="new-features-0-15-0">New features<a class="anchor" aria-label="anchor"
href="#new-features-0-15-0"></a></h3>
<ul><li>
<code>Table</code> and <code>RecordBatch</code> objects have S3 methods that
enable you to work with them more like <code>data.frame</code>s. Extract
columns, subset, and so on. See <code><a
href="../reference/Table.html">?Table</a></code> and <code><a
href="../reference/RecordBatch.html">?RecordBatch</a></code> for examples.</li>
-<li>Initial implementation of bindings for the C++ File System API. (<a
href="https://issues.apache.org/jira/browse/ARROW-6348"
class="external-link">ARROW-6348</a>)</li>
-<li>Compressed streams are now supported on Windows (<a
href="https://issues.apache.org/jira/browse/ARROW-6360"
class="external-link">ARROW-6360</a>), and you can also specify a compression
level (<a href="https://issues.apache.org/jira/browse/ARROW-6533"
class="external-link">ARROW-6533</a>)</li>
+<li>Initial implementation of bindings for the C++ File System API. (<a
href="https://github.com/apache/arrow/issues/5223"
class="external-link">#5223</a>)</li>
+<li>Compressed streams are now supported on Windows (<a
href="https://github.com/apache/arrow/issues/5329"
class="external-link">#5329</a>), and you can also specify a compression level
(<a href="https://github.com/apache/arrow/issues/5450"
class="external-link">#5450</a>)</li>
</ul></div>
<div class="section level3">
<h3 id="other-upgrades-0-15-0">Other upgrades<a class="anchor"
aria-label="anchor" href="#other-upgrades-0-15-0"></a></h3>
@@ -841,9 +934,9 @@
<code><a href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code>
supports more parsing options, including <code>col_names</code>,
<code>na</code>, <code>quoted_na</code>, and <code>skip</code>
</li>
<li>
-<code><a href="../reference/read_parquet.html">read_parquet()</a></code> and
<code><a href="../reference/read_feather.html">read_feather()</a></code> can
ingest data from a <code>raw</code> vector (<a
href="https://issues.apache.org/jira/browse/ARROW-6278"
class="external-link">ARROW-6278</a>)</li>
-<li>File readers now properly handle paths that need expanding, such as
<code>~/file.parquet</code> (<a
href="https://issues.apache.org/jira/browse/ARROW-6323"
class="external-link">ARROW-6323</a>)</li>
-<li>Improved support for creating types in a schema: the types’ printed names
(e.g. “double”) are guaranteed to be valid to use in instantiating a schema
(e.g. <code><a href="https://rdrr.io/r/base/double.html"
class="external-link">double()</a></code>), and time types can be created with
human-friendly resolution strings (“ms”, “s”, etc.). (<a
href="https://issues.apache.org/jira/browse/ARROW-6338"
class="external-link">ARROW-6338</a>, <a
href="https://issues.apache.org/jira/browse/ARRO [...]
+<code><a href="../reference/read_parquet.html">read_parquet()</a></code> and
<code><a href="../reference/read_feather.html">read_feather()</a></code> can
ingest data from a <code>raw</code> vector (<a
href="https://github.com/apache/arrow/issues/5141"
class="external-link">#5141</a>)</li>
+<li>File readers now properly handle paths that need expanding, such as
<code>~/file.parquet</code> (<a
href="https://github.com/apache/arrow/issues/5169"
class="external-link">#5169</a>)</li>
+<li>Improved support for creating types in a schema: the types’ printed names
(e.g. “double”) are guaranteed to be valid to use in instantiating a schema
(e.g. <code><a href="https://rdrr.io/r/base/double.html"
class="external-link">double()</a></code>), and time types can be created with
human-friendly resolution strings (“ms”, “s”, etc.). (<a
href="https://github.com/apache/arrow/issues/5198"
class="external-link">#5198</a>, <a
href="https://github.com/apache/arrow/issues/5201" class=" [...]
</ul></div>
</div>
<div class="section level2">