[arrow] branch master updated: ARROW-10257: [R] Prepare news/docs for 2.0 release

npr Fri, 09 Oct 2020 19:30:32 -0700

This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git



The following commit(s) were added to refs/heads/master by this push:
     new a6cbffe  ARROW-10257: [R] Prepare news/docs for 2.0 release
a6cbffe is described below

commit a6cbffef478ca81e853d11ea3989f9b870c18e99
Author: Neal Richardson <[email protected]>
AuthorDate: Fri Oct 9 19:29:30 2020 -0700

    ARROW-10257: [R] Prepare news/docs for 2.0 release
    
    Closes #8421 from nealrichardson/r-docs-2.0
    
    Authored-by: Neal Richardson <[email protected]>
    Signed-off-by: Neal Richardson <[email protected]>
---
 r/NEWS.md                    | 26 +++++++++++++++++++++++---
 r/R/memory-pool.R            |  5 ++++-
 r/R/parquet.R                |  6 ++++++
 r/_pkgdown.yml               | 14 +++++++++-----
 r/man/MemoryPool.Rd          |  7 +++++--
 r/man/default_memory_pool.Rd |  1 +
 r/man/write_parquet.Rd       |  6 ++++++
 7 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/r/NEWS.md b/r/NEWS.md
index 9e655ef..91dfe10 100644
--- a/r/NEWS.md
+++ b/r/NEWS.md
@@ -24,11 +24,21 @@
 * `write_dataset()` to Feather or Parquet files with partitioning. See the end 
of `vignette("dataset", package = "arrow")` for discussion and examples.
 * Datasets now have `head()`, `tail()`, and take (`[`) methods. `head()` is 
optimized but the others  may not be performant.
 * `collect()` gains an `as_data_frame` argument, default `TRUE` but when 
`FALSE` allows you to evaluate the accumulated `select` and `filter` query but 
keep the result in Arrow, not an R `data.frame`
+* `read_csv_arrow()` supports specifying column types, both with a `Schema` 
and with the compact string representation for types used in the `readr` 
package. It also has gained a `timestamp_parsers` argument that lets you 
express a set of `strptime` parse strings that will be tried to convert columns 
designated as `Timestamp` type.
 
 ## AWS S3 support
 
-* S3 support is now enabled in binary macOS and Windows (Rtools40 only, i.e. R 
>= 4.0) packages. To enable it on Linux, you need additional system 
dependencies `libcurl` and `openssl`, as well as a sufficiently modern 
compiler. See `vignette("install", package = "arrow")` for details.
-* File readers and writers (`read_parquet()`, `write_feather()`, et al.), as 
well as `open_dataset()` and `write_dataset()`, allow you to access resources 
on S3 (or on file systems that emulate S3) either by providing an `s3://` URI 
or by passing an additional `filesystem` argument. See `vignette("fs", package 
= "arrow")` for examples.
+* S3 support is now enabled in binary macOS and Windows (Rtools40 only, i.e. R 
>= 4.0) packages. To enable it on Linux, you need the additional system 
dependencies `libcurl` and `openssl`, as well as a sufficiently modern 
compiler. See `vignette("install", package = "arrow")` for details.
+* File readers and writers (`read_parquet()`, `write_feather()`, et al.), as 
well as `open_dataset()` and `write_dataset()`, allow you to access resources 
on S3 (or on file systems that emulate S3) either by providing an `s3://` URI 
or by providing a `FileSystem$path()`. See `vignette("fs", package = "arrow")` 
for examples.
+* `copy_files()` allows you to recursively copy directories of files from one 
file system to another, such as from S3 to your local machine.
+
+## Flight RPC
+
+[Flight](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/)
+is a general-purpose client-server framework for high performance
+transport of large datasets over network interfaces.
+The `arrow` R package now provides methods for connecting to Flight RPC servers
+to send and receive data. See `vignette("flight", package = "arrow")` for an 
overview.
 
 ## Computation
 
@@ -37,9 +47,19 @@
 * `dplyr` filter expressions on Arrow Tables and RecordBatches are now 
evaluated in the C++ library, rather than by pulling data into R and 
evaluating. This yields significant performance improvements.
 * `dim()` (`nrow`) for dplyr queries on Table/RecordBatch is now supported
 
-## Other improvements
+## Packaging and installation
 
 * `arrow` now depends on [`cpp11`](https://cpp11.r-lib.org/), which brings 
more robust UTF-8 handling and faster compilation
+* The Linux build script now succeeds on older versions of R
+* MacOS binary packages now ship with zstandard compression enabled
+
+## Bug fixes and other enhancements
+
+* Automatic conversion of Arrow `Int64` type when all values fit with an R 
32-bit integer now correctly inspects all chunks in a ChunkedArray, and this 
conversion can be disabled (so that `Int64` always yields a `bit64::integer64` 
vector) by setting `options(arrow.int64_downcast = FALSE)`.
+* In addition to the data.frame column metadata preserved in round trip, added 
in 1.0.0, now attributes of the data.frame itself are also preserved in Arrow 
schema metadata.
+* File writers now respect the system umask setting
+* `ParquetFileReader` has additional methods for accessing individual columns 
or row groups from the file
+* Various segfaults fixed: invalid input in `ParquetFileWriter`; invalid 
`ArrowObject` pointer from a saved R object; converting deeply nested structs 
from Arrow to R
 
 # arrow 1.0.1
 
diff --git a/r/R/memory-pool.R b/r/R/memory-pool.R
index d830e3b..dfd3a48 100644
--- a/r/R/memory-pool.R
+++ b/r/R/memory-pool.R
@@ -25,10 +25,12 @@
 #'
 #' @section Methods:
 #'
-#' TODO
+#' - `bytes_allocated()`
+#' - `max_memory()`
 #'
 #' @rdname MemoryPool
 #' @name MemoryPool
+#' @keywords internal
 MemoryPool <- R6Class("MemoryPool",
   inherit = ArrowObject,
   public = list(
@@ -44,6 +46,7 @@ MemoryPool <- R6Class("MemoryPool",
 #'
 #' @return the default [arrow::MemoryPool][MemoryPool]
 #' @export
+#' @keywords internal
 default_memory_pool <- function() {
   shared_ptr(MemoryPool, MemoryPool__default())
 }
diff --git a/r/R/parquet.R b/r/R/parquet.R
index acf7c2c..1a805c8 100644
--- a/r/R/parquet.R
+++ b/r/R/parquet.R
@@ -69,6 +69,12 @@ read_parquet <- function(file,
 #' [Parquet](https://parquet.apache.org/) is a columnar storage file format.
 #' This function enables you to write Parquet files from R.
 #'
+#' Due to features of the format, Parquet files cannot be appended to.
+#' If you want to use the Parquet format but also want the ability to extend
+#' your dataset, you can write to additional Parquet files and then treat
+#' the whole directory of files as a [Dataset] you can query.
+#' See `vignette("dataset", package = "arrow")` for examples of this.
+#'
 #' @param x `data.frame`, [RecordBatch], or [Table]
 #' @param sink A string file path, URI, or [OutputStream], or path in a file
 #' system (`SubTreeFileSystem`)
diff --git a/r/_pkgdown.yml b/r/_pkgdown.yml
index 4e74cab..d946f31 100644
--- a/r/_pkgdown.yml
+++ b/r/_pkgdown.yml
@@ -61,6 +61,7 @@ reference:
 - title: Multi-file datasets
   contents:
   - open_dataset
+  - write_dataset
   - dataset_factory
   - hive_partition
   - Dataset
@@ -68,6 +69,7 @@ reference:
   - Expression
   - Scanner
   - FileFormat
+  - FileWriteOptions
   - map_batches
 - title: Reading and writing files
   contents:
@@ -122,6 +124,13 @@ reference:
   - flight_connect
   - push_data
   - flight_get
+- title: File systems
+  contents:
+  - s3_bucket
+  - FileSystem
+  - FileInfo
+  - FileSelector
+  - copy_files
 - title: Input/Output
   contents:
   - InputStream
@@ -133,11 +142,6 @@ reference:
   - compression
   - Codec
   - codec_is_available
-  - MemoryPool
-  - default_memory_pool
-  - FileSystem
-  - FileInfo
-  - FileSelector
 - title: Configuration
   contents:
   - cpu_count
diff --git a/r/man/MemoryPool.Rd b/r/man/MemoryPool.Rd
index 8bffc76..9b16c45 100644
--- a/r/man/MemoryPool.Rd
+++ b/r/man/MemoryPool.Rd
@@ -9,7 +9,10 @@ class arrow::MemoryPool
 }
 \section{Methods}{
 
-
-TODO
+\itemize{
+\item \code{bytes_allocated()}
+\item \code{max_memory()}
+}
 }
 
+\keyword{internal}
diff --git a/r/man/default_memory_pool.Rd b/r/man/default_memory_pool.Rd
index 859b406..51dde97 100644
--- a/r/man/default_memory_pool.Rd
+++ b/r/man/default_memory_pool.Rd
@@ -12,3 +12,4 @@ the default \link[=MemoryPool]{arrow::MemoryPool}
 \description{
 default \link[=MemoryPool]{arrow::MemoryPool}
 }
+\keyword{internal}
diff --git a/r/man/write_parquet.Rd b/r/man/write_parquet.Rd
index f0adf94..f639db9 100644
--- a/r/man/write_parquet.Rd
+++ b/r/man/write_parquet.Rd
@@ -58,6 +58,12 @@ the input \code{x} invisibly.
 This function enables you to write Parquet files from R.
 }
 \details{
+Due to features of the format, Parquet files cannot be appended to.
+If you want to use the Parquet format but also want the ability to extend
+your dataset, you can write to additional Parquet files and then treat
+the whole directory of files as a \link{Dataset} you can query.
+See \code{vignette("dataset", package = "arrow")} for examples of this.
+
 The parameters \code{compression}, \code{compression_level}, 
\code{use_dictionary} and
 \code{write_statistics} support various patterns:
 \itemize{

[arrow] branch master updated: ARROW-10257: [R] Prepare news/docs for 2.0 release

Reply via email to