[arrow] branch master updated: ARROW-9316: [C++] Use "Dataset" instead of "Datasets"

wesm Fri, 03 Jul 2020 05:56:01 -0700

This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git



The following commit(s) were added to refs/heads/master by this push:
     new 9fa8645  ARROW-9316: [C++] Use "Dataset" instead of "Datasets"
9fa8645 is described below

commit 9fa8645f57b8861c43e459f74dec82b3a33aaf9b
Author: Sutou Kouhei <[email protected]>
AuthorDate: Fri Jul 3 07:54:36 2020 -0500

    ARROW-9316: [C++] Use "Dataset" instead of "Datasets"
    
    Because we use "dataset" for ID of this module such as
    libarrow_dataset.so and arrow/dataset/api.h.
    
    Closes #7629 from kou/cpp-dataset
    
    Authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Wes McKinney <[email protected]>
---
 c_glib/arrow-dataset-glib/arrow-dataset-glib.pc.in     |  4 ++--
 c_glib/arrow-dataset-glib/meson.build                  |  4 ++--
 c_glib/configure.ac                                    |  2 +-
 c_glib/doc/arrow-dataset-glib/meson.build              |  2 +-
 cpp/src/arrow/dataset/README.md                        |  2 +-
 cpp/src/arrow/dataset/arrow-dataset.pc.in              |  4 ++--
 .../apache-arrow/debian.ubuntu-xenial/control          | 12 ++++++------
 .../libarrow-dataset-glib-doc.doc-base                 |  4 ++--
 dev/tasks/linux-packages/apache-arrow/debian/control   | 12 ++++++------
 .../debian/libarrow-dataset-glib-doc.doc-base          |  4 ++--
 .../linux-packages/apache-arrow/yum/arrow.spec.in      | 18 +++++++++---------
 docs/source/python/api/dataset.rst                     |  4 ++--
 r/NEWS.md                                              |  8 ++++----
 r/R/dataset.R                                          |  4 ++--
 r/R/dplyr.R                                            |  4 ++--
 r/man/Dataset.Rd                                       |  2 +-
 r/man/open_dataset.Rd                                  |  2 +-
 r/tests/testthat/test-dataset.R                        |  8 ++++----
 r/vignettes/dataset.Rmd                                | 12 ++++++------
 19 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/c_glib/arrow-dataset-glib/arrow-dataset-glib.pc.in 
b/c_glib/arrow-dataset-glib/arrow-dataset-glib.pc.in
index a6aa822..ee7e139 100644
--- a/c_glib/arrow-dataset-glib/arrow-dataset-glib.pc.in
+++ b/c_glib/arrow-dataset-glib/arrow-dataset-glib.pc.in
@@ -20,8 +20,8 @@ exec_prefix=@exec_prefix@
 libdir=@libdir@
 includedir=@includedir@
 
-Name: Apache Arrow Datasets GLib
-Description: C API for Apache Arrow Datasets based on GLib
+Name: Apache Arrow Dataset GLib
+Description: C API for Apache Arrow Dataset based on GLib
 Version: @VERSION@
 Libs: -L${libdir} -larrow-dataset-glib
 Cflags: -I${includedir}
diff --git a/c_glib/arrow-dataset-glib/meson.build 
b/c_glib/arrow-dataset-glib/meson.build
index ae99013..b381710 100644
--- a/c_glib/arrow-dataset-glib/meson.build
+++ b/c_glib/arrow-dataset-glib/meson.build
@@ -51,8 +51,8 @@ arrow_dataset_glib = declare_dependency(link_with: 
libarrow_dataset_glib,
 
 pkgconfig.generate(libarrow_dataset_glib,
                    filebase: 'arrow-dataset-glib',
-                   name: 'Apache Arrow Datasets GLib',
-                   description: 'C API for Apache Arrow Datasets based on 
GLib',
+                   name: 'Apache Arrow Dataset GLib',
+                   description: 'C API for Apache Arrow Dataset based on GLib',
                    version: version,
                    requires: ['arrow-glib', 'arrow-dataset'])
 
diff --git a/c_glib/configure.ac b/c_glib/configure.ac
index c1b8824..27aed3c 100644
--- a/c_glib/configure.ac
+++ b/c_glib/configure.ac
@@ -289,7 +289,7 @@ AC_SUBST(PLASMA_ARROW_CUDA_PKG_CONFIG_PATH)
 
 AM_CONDITIONAL([HAVE_ARROW_DATASET], [test "$HAVE_ARROW_DATASET" = "yes"])
 if test "$HAVE_ARROW_DATASET" = "yes"; then
-  AC_DEFINE(HAVE_ARROW_DATASET, [1], [Define to 1 if Apache Arrow Datasets 
exists.])
+  AC_DEFINE(HAVE_ARROW_DATASET, [1], [Define to 1 if Apache Arrow Dataset 
exists.])
 fi
 
 AM_CONDITIONAL([HAVE_GANDIVA], [test "$HAVE_GANDIVA" = "yes"])
diff --git a/c_glib/doc/arrow-dataset-glib/meson.build 
b/c_glib/doc/arrow-dataset-glib/meson.build
index 79b4113..1cb2f9e 100644
--- a/c_glib/doc/arrow-dataset-glib/meson.build
+++ b/c_glib/doc/arrow-dataset-glib/meson.build
@@ -18,7 +18,7 @@
 # under the License.
 
 package_id = 'arrow-dataset-glib'
-package_name = 'Apache Arrow Datasets GLib'
+package_name = 'Apache Arrow Dataset GLib'
 entities_conf = configuration_data()
 entities_conf.set('PACKAGE', package_id)
 entities_conf.set('PACKAGE_BUGREPORT',
diff --git a/cpp/src/arrow/dataset/README.md b/cpp/src/arrow/dataset/README.md
index 5ee5a69..225f38a 100644
--- a/cpp/src/arrow/dataset/README.md
+++ b/cpp/src/arrow/dataset/README.md
@@ -17,7 +17,7 @@
   under the License.
 -->
 
-# Arrow C++ Datasets
+# Arrow C++ Dataset
 
 The `arrow::dataset` subcomponent provides an API to read and write
 semantic datasets stored in different locations and formats. It
diff --git a/cpp/src/arrow/dataset/arrow-dataset.pc.in 
b/cpp/src/arrow/dataset/arrow-dataset.pc.in
index c226bad..c03aad3 100644
--- a/cpp/src/arrow/dataset/arrow-dataset.pc.in
+++ b/cpp/src/arrow/dataset/arrow-dataset.pc.in
@@ -18,8 +18,8 @@
 libdir=@CMAKE_INSTALL_FULL_LIBDIR@
 includedir=@CMAKE_INSTALL_FULL_INCLUDEDIR@
 
-Name: Apache Arrow Datasets
-Description: Apache Arrow Datasets provides an API to read and write semantic 
datasets stored in different locations and formats.
+Name: Apache Arrow Dataset
+Description: Apache Arrow Dataset provides an API to read and write semantic 
datasets stored in different locations and formats.
 Version: @ARROW_VERSION@
 Requires: arrow parquet
 Libs: -L${libdir} -larrow_dataset
diff --git a/dev/tasks/linux-packages/apache-arrow/debian.ubuntu-xenial/control 
b/dev/tasks/linux-packages/apache-arrow/debian.ubuntu-xenial/control
index b78e089..896745a 100644
--- a/dev/tasks/linux-packages/apache-arrow/debian.ubuntu-xenial/control
+++ b/dev/tasks/linux-packages/apache-arrow/debian.ubuntu-xenial/control
@@ -70,7 +70,7 @@ Depends:
   libarrow100 (= ${binary:Version})
 Description: Apache Arrow is a data processing library for analysis
  .
- This package provides C++ library files for Datasets module.
+ This package provides C++ library files for dataset module.
 
 Package: libarrow-python100
 Section: libs
@@ -120,7 +120,7 @@ Depends:
   libarrow-dataset100 (= ${binary:Version})
 Description: Apache Arrow is a data processing library for analysis
  .
- This package provides C++ header files for Datasets module.
+ This package provides C++ header files for dataset module.
 
 Package: libarrow-python-dev
 Section: libdevel
@@ -323,7 +323,7 @@ Depends:
   libarrow-dataset100 (= ${binary:Version})
 Description: Apache Arrow is a data processing library for analysis
  .
- This package provides GLib based library files for Datasets module.
+ This package provides GLib based library files for dataset module.
 
 Package: gir1.2-arrow-dataset-1.0
 Section: introspection
@@ -334,7 +334,7 @@ Depends:
   ${misc:Depends}
 Description: Apache Arrow is a data processing library for analysis
  .
- This package provides GObject Introspection typelib files for Datasets module.
+ This package provides GObject Introspection typelib files for dataset module.
 
 Package: libarrow-dataset-glib-dev
 Section: libdevel
@@ -348,7 +348,7 @@ Depends:
   gir1.2-arrow-dataset-1.0 (= ${binary:Version})
 Description: Apache Arrow is a data processing library for analysis
  .
- This package provides GLib based header files for Datasets module.
+ This package provides GLib based header files for dataset module.
 
 Package: libarrow-dataset-glib-doc
 Section: doc
@@ -359,7 +359,7 @@ Depends:
 Recommends: libarrow-glib-doc
 Description: Apache Arrow is a data processing library for analysis
  .
- This package provides documentations for Datasets module.
+ This package provides documentations for dataset module.
 
 Package: libgandiva-glib100
 Section: libs
diff --git 
a/dev/tasks/linux-packages/apache-arrow/debian.ubuntu-xenial/libarrow-dataset-glib-doc.doc-base
 
b/dev/tasks/linux-packages/apache-arrow/debian.ubuntu-xenial/libarrow-dataset-glib-doc.doc-base
index 0003f57..a97707b 100644
--- 
a/dev/tasks/linux-packages/apache-arrow/debian.ubuntu-xenial/libarrow-dataset-glib-doc.doc-base
+++ 
b/dev/tasks/linux-packages/apache-arrow/debian.ubuntu-xenial/libarrow-dataset-glib-doc.doc-base
@@ -1,7 +1,7 @@
 Document: arrow-dataset-glib
-Title: Apache Arrow Datasets GLib Reference Manual
+Title: Apache Arrow dataset GLib Reference Manual
 Author: The Apache Software Foundation
-Abstract: Apache Arrow Datasets GLib provides an API to read and write 
semantic datasets stored in different locations and formats that uses GLib.
+Abstract: Apache Arrow dataset GLib provides an API to read and write semantic 
datasets stored in different locations and formats that uses GLib.
 Section: Programming
 
 Format: HTML
diff --git a/dev/tasks/linux-packages/apache-arrow/debian/control 
b/dev/tasks/linux-packages/apache-arrow/debian/control
index 928412e..1a07661 100644
--- a/dev/tasks/linux-packages/apache-arrow/debian/control
+++ b/dev/tasks/linux-packages/apache-arrow/debian/control
@@ -70,7 +70,7 @@ Depends:
   libparquet100 (= ${binary:Version})
 Description: Apache Arrow is a data processing library for analysis
  .
- This package provides C++ library files for Datasets module.
+ This package provides C++ library files for Dataset module.
 
 Package: libarrow-flight100
 Section: libs
@@ -148,7 +148,7 @@ Depends:
   libparquet-dev (= ${binary:Version})
 Description: Apache Arrow is a data processing library for analysis
  .
- This package provides C++ header files for Datasets module.
+ This package provides C++ header files for dataset module.
 
 Package: libarrow-flight-dev
 Section: libdevel
@@ -376,7 +376,7 @@ Depends:
   libarrow-dataset100 (= ${binary:Version})
 Description: Apache Arrow is a data processing library for analysis
  .
- This package provides GLib based library files for Datasets module.
+ This package provides GLib based library files for dataset module.
 
 Package: gir1.2-arrow-dataset-1.0
 Section: introspection
@@ -387,7 +387,7 @@ Depends:
   ${misc:Depends}
 Description: Apache Arrow is a data processing library for analysis
  .
- This package provides GObject Introspection typelib files for Datasets module.
+ This package provides GObject Introspection typelib files for dataset module.
 
 Package: libarrow-dataset-glib-dev
 Section: libdevel
@@ -401,7 +401,7 @@ Depends:
   gir1.2-arrow-dataset-1.0 (= ${binary:Version})
 Description: Apache Arrow is a data processing library for analysis
  .
- This package provides GLib based header files for Datasets module.
+ This package provides GLib based header files for dataset module.
 
 Package: libarrow-dataset-glib-doc
 Section: doc
@@ -412,7 +412,7 @@ Depends:
 Recommends: libarrow-glib-doc
 Description: Apache Arrow is a data processing library for analysis
  .
- This package provides documentations for Datasets module.
+ This package provides documentations for dataset module.
 
 Package: libgandiva-glib100
 Section: libs
diff --git 
a/dev/tasks/linux-packages/apache-arrow/debian/libarrow-dataset-glib-doc.doc-base
 
b/dev/tasks/linux-packages/apache-arrow/debian/libarrow-dataset-glib-doc.doc-base
index e18b8ba..5ec8156 100644
--- 
a/dev/tasks/linux-packages/apache-arrow/debian/libarrow-dataset-glib-doc.doc-base
+++ 
b/dev/tasks/linux-packages/apache-arrow/debian/libarrow-dataset-glib-doc.doc-base
@@ -1,7 +1,7 @@
 Document: arrow-dataset-glib
-Title: Apache Arrow Datasets GLib Reference Manual
+Title: Apache Arrow Dataset GLib Reference Manual
 Author: The Apache Software Foundation
-Abstract: Apache Arrow Datasets GLib provides an API to read and write 
semantic datasets stored in different locations and formats that uses GLib.
+Abstract: Apache Arrow Dataset GLib provides an API to read and write semantic 
datasets stored in different locations and formats that uses GLib.
 Section: Programming
 
 Format: HTML
diff --git a/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in 
b/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in
index 7cb543d..f36fdfd 100644
--- a/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in
+++ b/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in
@@ -281,7 +281,7 @@ License:    Apache-2.0
 Requires:      %{name}-libs = %{version}-%{release}
 
 %description dataset-libs
-This package contains the libraries for Apache Arrow Datasets.
+This package contains the libraries for Apache Arrow dataset.
 
 %files dataset-libs
 %defattr(-,root,root,-)
@@ -289,12 +289,12 @@ This package contains the libraries for Apache Arrow 
Datasets.
 %{_libdir}/libarrow_dataset.so.*
 
 %package dataset-devel
-Summary:       Libraries and header files for Apache Arrow Datasets.
+Summary:       Libraries and header files for Apache Arrow dataset.
 License:       Apache-2.0
 Requires:      %{name}-dataset-libs = %{version}-%{release}
 
 %description dataset-devel
-Libraries and header files for Apache Arrow Datasets.
+Libraries and header files for Apache Arrow dataset.
 
 %files dataset-devel
 %defattr(-,root,root,-)
@@ -585,13 +585,13 @@ Documentation for Apache Arrow GLib.
 %{_datadir}/gtk-doc/html/arrow-glib/
 
 %package dataset-glib-libs
-Summary:       Runtime libraries for Apache Arrow Datasets GLib
+Summary:       Runtime libraries for Apache Arrow dataset GLib
 License:       Apache-2.0
 Requires:      %{name}-dataset-libs = %{version}-%{release}
 Requires:      %{name}-glib-libs = %{version}-%{release}
 
 %description dataset-glib-libs
-This package contains the libraries for Apache Arrow Datasets GLib.
+This package contains the libraries for Apache Arrow dataset GLib.
 
 %files dataset-glib-libs
 %defattr(-,root,root,-)
@@ -600,13 +600,13 @@ This package contains the libraries for Apache Arrow 
Datasets GLib.
 %{_datadir}/gir-1.0/ArrowDataset-1.0.gir
 
 %package dataset-glib-devel
-Summary:       Libraries and header files for Apache Arrow Datasets GLib
+Summary:       Libraries and header files for Apache Arrow dataset GLib
 License:       Apache-2.0
 Requires:      %{name}-dataset-devel = %{version}-%{release}
 Requires:      %{name}-glib-devel = %{version}-%{release}
 
 %description dataset-glib-devel
-Libraries and header files for Apache Arrow Datasets GLib.
+Libraries and header files for Apache Arrow dataset GLib.
 
 %files dataset-glib-devel
 %defattr(-,root,root,-)
@@ -618,11 +618,11 @@ Libraries and header files for Apache Arrow Datasets GLib.
 %{_libdir}/girepository-1.0/ArrowDataset-1.0.typelib
 
 %package dataset-glib-doc
-Summary:       Documentation for Apache Arrow Datasets GLib
+Summary:       Documentation for Apache Arrow dataset GLib
 License:       Apache-2.0
 
 %description dataset-glib-doc
-Documentation for Apache Arrow Datasets GLib.
+Documentation for Apache Arrow dataset GLib.
 
 %files dataset-glib-doc
 %defattr(-,root,root,-)
diff --git a/docs/source/python/api/dataset.rst 
b/docs/source/python/api/dataset.rst
index b0cfd75..c011917 100644
--- a/docs/source/python/api/dataset.rst
+++ b/docs/source/python/api/dataset.rst
@@ -19,8 +19,8 @@
 
 .. _api.dataset:
 
-Datasets
-========
+Dataset
+=======
 
 .. warning::
 
diff --git a/r/NEWS.md b/r/NEWS.md
index 1679e9a..421a34a 100644
--- a/r/NEWS.md
+++ b/r/NEWS.md
@@ -19,7 +19,7 @@
 
 # arrow 0.17.1.9000
 
-## Datasets
+## Dataset
 
 * CSV and other text-delimited datasets are now supported
 * Read datasets directly on S3 by passing a URL like `ds <- 
open_dataset("s3://...")`. Note that this currently requires a special C++ 
library build with additional dependencies; that is, this is not yet available 
in CRAN releases or in nightly packages.
@@ -77,10 +77,10 @@ us to use `reticulate` to share data between R and Python 
(`pyarrow`) efficientl
 
 See `vignette("python", package = "arrow")` for details.
 
-## Datasets
+## Dataset
 
 * Dataset reading benefits from many speedups and fixes in the C++ library
-* Datasets have a `dim()` method, which sums rows across all files (#6635, 
@boshek)
+* Dataset has a `dim()` method, which sums rows across all files (#6635, 
@boshek)
 * Combine multiple datasets into a single queryable `UnionDataset` with the 
`c()` method
 * Dataset filtering now treats `NA` as `FALSE`, consistent with 
`dplyr::filter()`
 * Dataset filtering is now correctly supported for all Arrow 
date/time/timestamp column types
@@ -111,7 +111,7 @@ similar to the `write_*` functions in the `readr` package 
(#6387, @boshek)
 
 ## Multi-file datasets
 
-This release includes a `dplyr` interface to Arrow Datasets,
+This release includes a `dplyr` interface to Arrow Dataset,
 which let you work efficiently with large, multi-file datasets as a single 
entity.
 Explore a directory of data files with `open_dataset()` and then use `dplyr` 
methods to `select()`, `filter()`, etc. Work will be done where possible in 
Arrow memory. When necessary, data is pulled into R for further computation. 
`dplyr` methods are conditionally loaded if you have `dplyr` available; it is 
not a hard dependency.
 
diff --git a/r/R/dataset.R b/r/R/dataset.R
index eb229d7..b220698 100644
--- a/r/R/dataset.R
+++ b/r/R/dataset.R
@@ -17,7 +17,7 @@
 
 #' Open a multi-file dataset
 #'
-#' Arrow Datasets allow you to query against data that has been split across
+#' Arrow Dataset allow you to query against data that has been split across
 #' multiple files. This sharding of data may indicate partitioning, which
 #' can accelerate queries that only touch some partitions (files). Call
 #' `open_dataset()` to point to a directory of data files and return a
@@ -89,7 +89,7 @@ open_dataset <- function(sources,
 #' Multi-file datasets
 #'
 #' @description
-#' Arrow Datasets allow you to query against data that has been split across
+#' Arrow Dataset allow you to query against data that has been split across
 #' multiple files. This sharding of data may indicate partitioning, which
 #' can accelerate queries that only touch some partitions (files).
 #'
diff --git a/r/R/dplyr.R b/r/R/dplyr.R
index bf5d3c6..7b96471 100644
--- a/r/R/dplyr.R
+++ b/r/R/dplyr.R
@@ -165,7 +165,7 @@ filter.arrow_dplyr_query <- function(.data, ..., .preserve 
= FALSE) {
       # Abort. We don't want to auto-collect if this is a Dataset because that
       # could blow up, too big.
       stop(
-        "Filter expression not supported for Arrow Datasets: ", bads,
+        "Filter expression not supported for Arrow Dataset: ", bads,
         "\nCall collect() first to pull data into R.",
         call. = FALSE
       )
@@ -346,7 +346,7 @@ query_on_dataset <- function(x) inherits(x$.data, "Dataset")
 
 not_implemented_for_dataset <- function(method) {
   stop(
-    method, " is not currently implemented for Arrow Datasets. ",
+    method, " is not currently implemented for Arrow Dataset. ",
     "Call collect() first to pull data into R.",
     call. = FALSE
   )
diff --git a/r/man/Dataset.Rd b/r/man/Dataset.Rd
index 686611c..3d8a814 100644
--- a/r/man/Dataset.Rd
+++ b/r/man/Dataset.Rd
@@ -8,7 +8,7 @@
 \alias{FileSystemDatasetFactory}
 \title{Multi-file datasets}
 \description{
-Arrow Datasets allow you to query against data that has been split across
+Arrow Dataset allow you to query against data that has been split across
 multiple files. This sharding of data may indicate partitioning, which
 can accelerate queries that only touch some partitions (files).
 
diff --git a/r/man/open_dataset.Rd b/r/man/open_dataset.Rd
index 379ace4..16ea99a 100644
--- a/r/man/open_dataset.Rd
+++ b/r/man/open_dataset.Rd
@@ -58,7 +58,7 @@ A \link{Dataset} R6 object. Use \code{dplyr} methods on it to 
query the data,
 or call \code{\link[=Scanner]{$NewScan()}} to construct a query directly.
 }
 \description{
-Arrow Datasets allow you to query against data that has been split across
+Arrow Dataset allow you to query against data that has been split across
 multiple files. This sharding of data may indicate partitioning, which
 can accelerate queries that only touch some partitions (files). Call
 \code{open_dataset()} to point to a directory of data files and return a
diff --git a/r/tests/testthat/test-dataset.R b/r/tests/testthat/test-dataset.R
index a1a043b..a205511 100644
--- a/r/tests/testthat/test-dataset.R
+++ b/r/tests/testthat/test-dataset.R
@@ -15,7 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-context("Datasets")
+context("Dataset")
 
 library(dplyr)
 
@@ -467,18 +467,18 @@ test_that("dplyr method not implemented messages", {
   # This one is more nuanced
   expect_error(
     ds %>% filter(int > 6, dbl > max(dbl)),
-    "Filter expression not supported for Arrow Datasets: dbl > max(dbl)\nCall 
collect() first to pull data into R.",
+    "Filter expression not supported for Arrow Dataset: dbl > max(dbl)\nCall 
collect() first to pull data into R.",
     fixed = TRUE
   )
   # One explicit test of the full message
   expect_error(
     ds %>% summarize(mean(int)),
-    "summarize() is not currently implemented for Arrow Datasets. Call 
collect() first to pull data into R.",
+    "summarize() is not currently implemented for Arrow Dataset. Call 
collect() first to pull data into R.",
     fixed = TRUE
   )
   # Helper for everything else
   expect_not_implemented <- function(x) {
-    expect_error(x, "is not currently implemented for Arrow Datasets")
+    expect_error(x, "is not currently implemented for Arrow Dataset")
   }
   expect_not_implemented(ds %>% arrange(int))
   expect_not_implemented(ds %>% mutate(int = int + 2))
diff --git a/r/vignettes/dataset.Rmd b/r/vignettes/dataset.Rmd
index e2b0ea3..a0c433f 100644
--- a/r/vignettes/dataset.Rmd
+++ b/r/vignettes/dataset.Rmd
@@ -1,17 +1,17 @@
 ---
-title: "Working with Arrow Datasets and dplyr"
+title: "Working with Arrow Dataset and dplyr"
 output: rmarkdown::html_vignette
 vignette: >
-  %\VignetteIndexEntry{Working with Arrow Datasets and dplyr}
+  %\VignetteIndexEntry{Working with Arrow Dataset and dplyr}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
 ---
 
 Apache Arrow lets you work efficiently with large, multi-file datasets.
-The `arrow` R package provides a `dplyr` interface to Arrow Datasets,
+The `arrow` R package provides a `dplyr` interface to Arrow Dataset,
 as well as other tools for interactive exploration of Arrow data.
 
-This vignette introduces Datasets and shows how to use `dplyr` to analyze them.
+This vignette introduces Dataset and shows how to use `dplyr` to analyze them.
 It describes both what is possible to do with Arrow now
 and what is on the immediate development roadmap.
 
@@ -70,7 +70,7 @@ dir.exists("nyc-taxi")
 ## Getting started
 
 Because `dplyr` is not necessary for many Arrow workflows,
-it is an optional (`Suggests`) dependency. So, to work with Datasets,
+it is an optional (`Suggests`) dependency. So, to work with Dataset,
 we need to load both `arrow` and `dplyr`.
 
 ```{r}
@@ -272,7 +272,7 @@ in order to declare the types of the virtual columns that 
define the partitions.
 This would be useful, in our taxi dataset example, if you wanted to keep
 "month" as a string instead of an integer for some reason.
 
-Another feature of Datasets is that they can be composed of multiple data 
sources.
+Another feature of Dataset is that they can be composed of multiple data 
sources.
 That is, you may have a directory of partitioned Parquet files in one location,
 and in another directory, files that haven't been partitioned.
 In the future, when there is support for cloud storage and other file formats,

[arrow] branch master updated: ARROW-9316: [C++] Use "Dataset" instead of "Datasets"

Reply via email to