[GitHub] [arrow] nealrichardson commented on a change in pull request #8256: ARROW-9001: [R] Box outputs as correct type in call_function

GitBox Thu, 29 Oct 2020 08:51:36 -0700


nealrichardson commented on a change in pull request #8256:
URL: https://github.com/apache/arrow/pull/8256#discussion_r514363942




##########
File path: r/src/dataset.cpp
##########
@@ -172,14 +220,13 @@ std::string dataset___FileFormat__type_name(
 }
 
 // [[arrow::export]]
-std::shared_ptr<ds::FileWriteOptions> 
dataset___FileFormat__DefaultWriteOptions(
-    const std::shared_ptr<ds::FileFormat>& fmt) {
-  return fmt->DefaultWriteOptions();
+R6 dataset___FileFormat__DefaultWriteOptions(const 
std::shared_ptr<ds::FileFormat>& fmt) {
+  return cpp11::r6(fmt->DefaultWriteOptions(), "FileWriteOptions");
 }
 
 // [[arrow::export]]
-std::shared_ptr<ds::ParquetFileFormat> dataset___ParquetFileFormat__Make(
-    bool use_buffered_stream, int64_t buffer_size, cpp11::strings 
dict_columns) {
+R6 dataset___ParquetFileFormat__MakeRead(bool use_buffered_stream, int64_t 
buffer_size,

Review comment:
       ```suggestion
   R6 dataset___ParquetFileFormat__Make(bool use_buffered_stream, int64_t 
buffer_size,
   ```

##########
File path: r/R/dataset-format.R
##########
@@ -98,9 +84,16 @@ as.character.FileFormat <- function(x, ...) {
 ParquetFileFormat <- R6Class("ParquetFileFormat", inherit = FileFormat)
 ParquetFileFormat$create <- function(use_buffered_stream = FALSE,
                                      buffer_size = 8196,
-                                     dict_columns = character(0)) {
-  shared_ptr(ParquetFileFormat, dataset___ParquetFileFormat__Make(
-    use_buffered_stream, buffer_size, dict_columns))
+                                     dict_columns = character(0),
+                                     writer_properties = NULL,
+                                     arrow_writer_properties = NULL) {
+  if (is.null(writer_properties) && is.null(arrow_writer_properties)) {
+    dataset___ParquetFileFormat__MakeRead(use_buffered_stream, buffer_size, 
dict_columns)
+  } else {
+    writer_properties = writer_properties %||% ParquetWriterProperties$create()
+    arrow_writer_properties = arrow_writer_properties %||% 
ParquetArrowWriterProperties$create()
+    dataset___ParquetFileFormat__MakeWrite(writer_properties, 
arrow_writer_properties)
+  }

Review comment:
       We pulled the writer properties out of ParquetFileFormat
   
   ```suggestion
                                        dict_columns = character(0)) {
    dataset___ParquetFileFormat__Make(use_buffered_stream, buffer_size, 
dict_columns)
   ```

##########
File path: r/R/dataset.R
##########
@@ -145,29 +145,24 @@ open_dataset <- function(sources,
 #' @seealso [open_dataset()] for a simple interface to creating a `Dataset`
 Dataset <- R6Class("Dataset", inherit = ArrowObject,
   public = list(
-    ..dispatch = function() {
-      type <- self$type
-      if (type == "union") {
-        shared_ptr(UnionDataset, self$pointer())
-      } else if (type == "filesystem") {
-        shared_ptr(FileSystemDataset, self$pointer())
-      } else {
-        self
-      }
-    },
     # @description
     # Start a new scan of the data
     # @return A [ScannerBuilder]
-    NewScan = function() unique_ptr(ScannerBuilder, 
dataset___Dataset__NewScan(self)),
-    ToString = function() self$schema$ToString()
+    NewScan = function() dataset___Dataset__NewScan(self),
+    ToString = function() self$schema$ToString(),
+    write = function(path, filesystem = NULL, schema = self$schema, format, 
partitioning, ...) {
+      path_and_fs <- get_path_and_filesystem(path, filesystem)
+      dataset___Dataset__Write(self, schema, format, path_and_fs$fs, 
path_and_fs$path, partitioning)
+      invisible(self)
+    }

Review comment:
       Another removal from the latest dataset writing patch
   
   ```suggestion
       ToString = function() self$schema$ToString()
   ```

##########
File path: r/R/type.R
##########
@@ -37,51 +37,18 @@ DataType <- R6Class("DataType",
     Equals = function(other, ...) {
       inherits(other, "DataType") && DataType__Equals(self, other)
     },
+    num_fields = function() {
+      DataType__num_fields(self)
+    },
     num_children = function() {
-      DataType__num_children(self)
+      DataType__num_fields(self)
     },
     children = function() {
-      map(DataType__children_pointer(self), shared_ptr, class = Field)
+      # TODO: this is deprecated
+      DataType__fields(self)
     },

Review comment:
       If these are deprecated, we should just delete them. I think these are 
sufficiently buried (we haven't promoted any workflows that get into the 
methods of DataType objects, and these aren't documented) that they're safe to 
remove.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] nealrichardson commented on a change in pull request #8256: ARROW-9001: [R] Box outputs as correct type in call_function

Reply via email to