This is an automated email from the ASF dual-hosted git repository.
kszucs pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new 39adf19 ARROW-15327: [R] Update news for 7.0.0
39adf19 is described below
commit 39adf19f31a529eaec35704685532feee1d8c7a4
Author: Jonathan Keane <[email protected]>
AuthorDate: Wed Jan 19 10:31:27 2022 +0100
ARROW-15327: [R] Update news for 7.0.0
Closes #12159 from jonkeane/7.0.0-news
Lead-authored-by: Jonathan Keane <[email protected]>
Co-authored-by: Neal Richardson <[email protected]>
Signed-off-by: Krisztián Szűcs <[email protected]>
---
r/NEWS.md | 54 +++++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 45 insertions(+), 9 deletions(-)
diff --git a/r/NEWS.md b/r/NEWS.md
index 89e990c..9d75196 100644
--- a/r/NEWS.md
+++ b/r/NEWS.md
@@ -19,15 +19,51 @@
# arrow 6.0.1.9000
-* Added `decimal256()`. Updated `decimal()`, which now calls `decimal256()` or
`decimal128()` based on the value of the `precision` argument.
-* updated `write_csv_arrow()` to follow the signature of `readr::write_csv()`.
The following arguments are supported:
- * `file` identical to `sink`
- * `col_names` identical to `include_header`
- * other arguments are currently unsupported, but the function errors with a
meaningful message.
-* Added `decimal128()` (~~identical to `decimal()`~~) as the name is more
explicit and updated docs to encourage its use.
-* Source builds now by default use `pkg-config` to search for system
dependencies (such as `libz`) and link to them
-if present. To retain the previous behaviour of downloading and building all
dependencies, set `ARROW_DEPENDENCY_SOURCE=BUNDLED`.
-* Opening datasets now use async scanner by default which resolves a deadlock
issues related to reading in large multi-CSV datasets
+## Enhancements to dplyr and datasets
+
+* Additional `lubridate` features: `week()`, more of the `is.*()` functions,
and the label argument to `month()` have been implemented.
+* More complex expressions inside `summarize()`, such as `ifelse(n() > 1,
mean(y), mean(z))`, are supported.
+* When adding columns in a dplyr pipeline, one can now use `tibble` and
`data.frame` to create columns of tibbles or data.frames respectively (e.g.
`... %>% mutate(df_col = tibble(a, b)) %>% ...`).
+* Dictionary columns (R `factor` type) are supported inside of `coalesce()`.
+* `open_dataset()` accepts the `partitioning` argument when reading Hive-style
partitioned files, even though it is not required.
+* The experimental `map_batches()` function for custom operations on dataset
has been restored.
+
+## CSV
+
+* Delimited files (including CSVs) with encodings other than UTF can now be
read (using the `encoding` argument when reading).
+* `open_dataset()` correctly ignores byte-order marks (`BOM`s) in CSVs, as
already was true for reading single files
+* Reading a dataset internally uses an asynchronous scanner by default, which
resolves a potential deadlock when reading in large CSV datasets.
+* `head()` no longer hangs on large CSV datasets.
+* There is an improved error message when there is a conflict between a header
in the file and schema/column names provided as arguments.
+* `write_csv_arrow()` now follows the signature of `readr::write_csv()`.
+
+## Other improvements and fixes
+
+* Many of the vignettes have been reorganized, restructured and expanded to
improve their usefulness and clarity.
+* Code to generate schemas (and individual data type specficiations) are
accessible with the `$code()` method on a `schema` or `type`. This allows you
to easily get the code needed to create a schema from an object that already
has one.
+* Arrow `Duration` type has been mapped to R's `difftime` class.
+* The `decimal256()` type is supported. The `decimal()` function has been
revised to call either `decimal256()` or `decimal128()` based on the value of
the `precision` argument.
+* `write_parquet()` uses a reasonable guess at `chunk_size` instead of always
writing a single chunk. This improves the speed of reading and writing large
Parquet files.
+* `write_parquet()` no longer drops attributes for grouped data.frames.
+* Chunked arrays are now supported using ALTREP.
+* ALTREP vectors backed by Arrow arrays are no longer unexpectedly mutated by
sorting or negation.
+* S3 file systems can be created with `proxy_options`.
+* A segfault when creating S3 file systems has been fixed.
+* Integer division in Arrow more closely matches R's behavior.
+
+## Installation
+
+* Source builds now by default use `pkg-config` to search for system
dependencies (such as `libz`) and link to them if present. This new default
will make building Arrow from source quicker on systems that have these
dependencies installed already. To retain the previous behavior of downloading
and building all dependencies, set `ARROW_DEPENDENCY_SOURCE=BUNDLED`.
+* Snappy and lz4 compression libraries are enabled by default in Linux builds.
This means that the default build of Arrow, without setting any environment
variables, will be able to read and write snappy encoded Parquet files.
+* Windows binary packages include brotli compression support.
+* Building Arrow on Windows can find a locally built libarrow library.
+* The package compiles and installs on Raspberry Pi OS.
+
+## Under-the-hood changes
+
+* The pointers used to pass data between R and Python have been made more
reliable. Backwards compatibility with older versions of pyarrow has been
maintained.
+* The internal method of registering new bindings for use in dplyr queries has
changed. See the new vignette about writing bindings for more information about
how that works.
+* R 3.3 is no longer supported. `glue`, which `arrow` depends on transitively,
has dropped support for it.
# arrow 6.0.1