This is an automated email from the ASF dual-hosted git repository.
npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new a6cbffe ARROW-10257: [R] Prepare news/docs for 2.0 release
a6cbffe is described below
commit a6cbffef478ca81e853d11ea3989f9b870c18e99
Author: Neal Richardson <[email protected]>
AuthorDate: Fri Oct 9 19:29:30 2020 -0700
ARROW-10257: [R] Prepare news/docs for 2.0 release
Closes #8421 from nealrichardson/r-docs-2.0
Authored-by: Neal Richardson <[email protected]>
Signed-off-by: Neal Richardson <[email protected]>
---
r/NEWS.md | 26 +++++++++++++++++++++++---
r/R/memory-pool.R | 5 ++++-
r/R/parquet.R | 6 ++++++
r/_pkgdown.yml | 14 +++++++++-----
r/man/MemoryPool.Rd | 7 +++++--
r/man/default_memory_pool.Rd | 1 +
r/man/write_parquet.Rd | 6 ++++++
7 files changed, 54 insertions(+), 11 deletions(-)
diff --git a/r/NEWS.md b/r/NEWS.md
index 9e655ef..91dfe10 100644
--- a/r/NEWS.md
+++ b/r/NEWS.md
@@ -24,11 +24,21 @@
* `write_dataset()` to Feather or Parquet files with partitioning. See the end
of `vignette("dataset", package = "arrow")` for discussion and examples.
* Datasets now have `head()`, `tail()`, and take (`[`) methods. `head()` is
optimized but the others may not be performant.
* `collect()` gains an `as_data_frame` argument, default `TRUE` but when
`FALSE` allows you to evaluate the accumulated `select` and `filter` query but
keep the result in Arrow, not an R `data.frame`
+* `read_csv_arrow()` supports specifying column types, both with a `Schema`
and with the compact string representation for types used in the `readr`
package. It also has gained a `timestamp_parsers` argument that lets you
express a set of `strptime` parse strings that will be tried to convert columns
designated as `Timestamp` type.
## AWS S3 support
-* S3 support is now enabled in binary macOS and Windows (Rtools40 only, i.e. R
>= 4.0) packages. To enable it on Linux, you need additional system
dependencies `libcurl` and `openssl`, as well as a sufficiently modern
compiler. See `vignette("install", package = "arrow")` for details.
-* File readers and writers (`read_parquet()`, `write_feather()`, et al.), as
well as `open_dataset()` and `write_dataset()`, allow you to access resources
on S3 (or on file systems that emulate S3) either by providing an `s3://` URI
or by passing an additional `filesystem` argument. See `vignette("fs", package
= "arrow")` for examples.
+* S3 support is now enabled in binary macOS and Windows (Rtools40 only, i.e. R
>= 4.0) packages. To enable it on Linux, you need the additional system
dependencies `libcurl` and `openssl`, as well as a sufficiently modern
compiler. See `vignette("install", package = "arrow")` for details.
+* File readers and writers (`read_parquet()`, `write_feather()`, et al.), as
well as `open_dataset()` and `write_dataset()`, allow you to access resources
on S3 (or on file systems that emulate S3) either by providing an `s3://` URI
or by providing a `FileSystem$path()`. See `vignette("fs", package = "arrow")`
for examples.
+* `copy_files()` allows you to recursively copy directories of files from one
file system to another, such as from S3 to your local machine.
+
+## Flight RPC
+
+[Flight](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/)
+is a general-purpose client-server framework for high performance
+transport of large datasets over network interfaces.
+The `arrow` R package now provides methods for connecting to Flight RPC servers
+to send and receive data. See `vignette("flight", package = "arrow")` for an
overview.
## Computation
@@ -37,9 +47,19 @@
* `dplyr` filter expressions on Arrow Tables and RecordBatches are now
evaluated in the C++ library, rather than by pulling data into R and
evaluating. This yields significant performance improvements.
* `dim()` (`nrow`) for dplyr queries on Table/RecordBatch is now supported
-## Other improvements
+## Packaging and installation
* `arrow` now depends on [`cpp11`](https://cpp11.r-lib.org/), which brings
more robust UTF-8 handling and faster compilation
+* The Linux build script now succeeds on older versions of R
+* MacOS binary packages now ship with zstandard compression enabled
+
+## Bug fixes and other enhancements
+
+* Automatic conversion of Arrow `Int64` type when all values fit with an R
32-bit integer now correctly inspects all chunks in a ChunkedArray, and this
conversion can be disabled (so that `Int64` always yields a `bit64::integer64`
vector) by setting `options(arrow.int64_downcast = FALSE)`.
+* In addition to the data.frame column metadata preserved in round trip, added
in 1.0.0, now attributes of the data.frame itself are also preserved in Arrow
schema metadata.
+* File writers now respect the system umask setting
+* `ParquetFileReader` has additional methods for accessing individual columns
or row groups from the file
+* Various segfaults fixed: invalid input in `ParquetFileWriter`; invalid
`ArrowObject` pointer from a saved R object; converting deeply nested structs
from Arrow to R
# arrow 1.0.1
diff --git a/r/R/memory-pool.R b/r/R/memory-pool.R
index d830e3b..dfd3a48 100644
--- a/r/R/memory-pool.R
+++ b/r/R/memory-pool.R
@@ -25,10 +25,12 @@
#'
#' @section Methods:
#'
-#' TODO
+#' - `bytes_allocated()`
+#' - `max_memory()`
#'
#' @rdname MemoryPool
#' @name MemoryPool
+#' @keywords internal
MemoryPool <- R6Class("MemoryPool",
inherit = ArrowObject,
public = list(
@@ -44,6 +46,7 @@ MemoryPool <- R6Class("MemoryPool",
#'
#' @return the default [arrow::MemoryPool][MemoryPool]
#' @export
+#' @keywords internal
default_memory_pool <- function() {
shared_ptr(MemoryPool, MemoryPool__default())
}
diff --git a/r/R/parquet.R b/r/R/parquet.R
index acf7c2c..1a805c8 100644
--- a/r/R/parquet.R
+++ b/r/R/parquet.R
@@ -69,6 +69,12 @@ read_parquet <- function(file,
#' [Parquet](https://parquet.apache.org/) is a columnar storage file format.
#' This function enables you to write Parquet files from R.
#'
+#' Due to features of the format, Parquet files cannot be appended to.
+#' If you want to use the Parquet format but also want the ability to extend
+#' your dataset, you can write to additional Parquet files and then treat
+#' the whole directory of files as a [Dataset] you can query.
+#' See `vignette("dataset", package = "arrow")` for examples of this.
+#'
#' @param x `data.frame`, [RecordBatch], or [Table]
#' @param sink A string file path, URI, or [OutputStream], or path in a file
#' system (`SubTreeFileSystem`)
diff --git a/r/_pkgdown.yml b/r/_pkgdown.yml
index 4e74cab..d946f31 100644
--- a/r/_pkgdown.yml
+++ b/r/_pkgdown.yml
@@ -61,6 +61,7 @@ reference:
- title: Multi-file datasets
contents:
- open_dataset
+ - write_dataset
- dataset_factory
- hive_partition
- Dataset
@@ -68,6 +69,7 @@ reference:
- Expression
- Scanner
- FileFormat
+ - FileWriteOptions
- map_batches
- title: Reading and writing files
contents:
@@ -122,6 +124,13 @@ reference:
- flight_connect
- push_data
- flight_get
+- title: File systems
+ contents:
+ - s3_bucket
+ - FileSystem
+ - FileInfo
+ - FileSelector
+ - copy_files
- title: Input/Output
contents:
- InputStream
@@ -133,11 +142,6 @@ reference:
- compression
- Codec
- codec_is_available
- - MemoryPool
- - default_memory_pool
- - FileSystem
- - FileInfo
- - FileSelector
- title: Configuration
contents:
- cpu_count
diff --git a/r/man/MemoryPool.Rd b/r/man/MemoryPool.Rd
index 8bffc76..9b16c45 100644
--- a/r/man/MemoryPool.Rd
+++ b/r/man/MemoryPool.Rd
@@ -9,7 +9,10 @@ class arrow::MemoryPool
}
\section{Methods}{
-
-TODO
+\itemize{
+\item \code{bytes_allocated()}
+\item \code{max_memory()}
+}
}
+\keyword{internal}
diff --git a/r/man/default_memory_pool.Rd b/r/man/default_memory_pool.Rd
index 859b406..51dde97 100644
--- a/r/man/default_memory_pool.Rd
+++ b/r/man/default_memory_pool.Rd
@@ -12,3 +12,4 @@ the default \link[=MemoryPool]{arrow::MemoryPool}
\description{
default \link[=MemoryPool]{arrow::MemoryPool}
}
+\keyword{internal}
diff --git a/r/man/write_parquet.Rd b/r/man/write_parquet.Rd
index f0adf94..f639db9 100644
--- a/r/man/write_parquet.Rd
+++ b/r/man/write_parquet.Rd
@@ -58,6 +58,12 @@ the input \code{x} invisibly.
This function enables you to write Parquet files from R.
}
\details{
+Due to features of the format, Parquet files cannot be appended to.
+If you want to use the Parquet format but also want the ability to extend
+your dataset, you can write to additional Parquet files and then treat
+the whole directory of files as a \link{Dataset} you can query.
+See \code{vignette("dataset", package = "arrow")} for examples of this.
+
The parameters \code{compression}, \code{compression_level},
\code{use_dictionary} and
\code{write_statistics} support various patterns:
\itemize{