[arrow] branch master updated: MINOR: [R][Doc] Update phrasing of docs for chunk_size argument to better reflect what it means

jonkeane Mon, 15 Nov 2021 06:07:50 -0800

This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git



The following commit(s) were added to refs/heads/master by this push:
     new 24acebc  MINOR: [R][Doc] Update phrasing of docs for chunk_size 
argument to better reflect what it means
24acebc is described below

commit 24acebcb5325c2bd2210cbe80e179250b60df7b1
Author: Nic Crane <[email protected]>
AuthorDate: Mon Nov 15 08:05:43 2021 -0600

    MINOR: [R][Doc] Update phrasing of docs for chunk_size argument to better 
reflect what it means
    
    Closes #11681 from thisisnic/chunk_size
    
    Authored-by: Nic Crane <[email protected]>
    Signed-off-by: Jonathan Keane <[email protected]>
---
 r/R/parquet.R          | 6 +++++-
 r/man/write_parquet.Rd | 6 +++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/r/R/parquet.R b/r/R/parquet.R
index 33cbb33..d262527 100644
--- a/r/R/parquet.R
+++ b/r/R/parquet.R
@@ -82,7 +82,11 @@ read_parquet <- function(file,
 #' @param x `data.frame`, [RecordBatch], or [Table]
 #' @param sink A string file path, URI, or [OutputStream], or path in a file
 #' system (`SubTreeFileSystem`)
-#' @param chunk_size chunk size in number of rows. If NULL, the total number 
of rows is used.
+#' @param chunk_size how many rows of data to write to disk at once. This
+#' directly corresponds to how many rows will be in each row group in parquet.
+#' If `NULL`, a best guess will be made for optimal size (based on the number 
of
+#'  columns and number of rows), though if the data has fewer than 250 million
+#'  cells (rows x cols), then the total number of rows is used.
 #' @param version parquet version, "1.0" or "2.0". Default "1.0". Numeric 
values
 #'   are coerced to character.
 #' @param compression compression algorithm. Default "snappy". See details.
diff --git a/r/man/write_parquet.Rd b/r/man/write_parquet.Rd
index d7147f7..efc6856 100644
--- a/r/man/write_parquet.Rd
+++ b/r/man/write_parquet.Rd
@@ -27,7 +27,11 @@ write_parquet(
 \item{sink}{A string file path, URI, or \link{OutputStream}, or path in a file
 system (\code{SubTreeFileSystem})}
 
-\item{chunk_size}{chunk size in number of rows. If NULL, the total number of 
rows is used.}
+\item{chunk_size}{how many rows of data to write to disk at once. This
+directly corresponds to how many rows will be in each row group in parquet.
+If \code{NULL}, a best guess will be made for optimal size (based on the 
number of
+columns and number of rows), though if the data has fewer than 250 million
+cells (rows x cols), then the total number of rows is used.}
 
 \item{version}{parquet version, "1.0" or "2.0". Default "1.0". Numeric values
 are coerced to character.}

[arrow] branch master updated: MINOR: [R][Doc] Update phrasing of docs for chunk_size argument to better reflect what it means

Reply via email to