nealrichardson commented on a change in pull request #8256:
URL: https://github.com/apache/arrow/pull/8256#discussion_r514363942
##########
File path: r/src/dataset.cpp
##########
@@ -172,14 +220,13 @@ std::string dataset___FileFormat__type_name(
}
// [[arrow::export]]
-std::shared_ptr<ds::FileWriteOptions>
dataset___FileFormat__DefaultWriteOptions(
- const std::shared_ptr<ds::FileFormat>& fmt) {
- return fmt->DefaultWriteOptions();
+R6 dataset___FileFormat__DefaultWriteOptions(const
std::shared_ptr<ds::FileFormat>& fmt) {
+ return cpp11::r6(fmt->DefaultWriteOptions(), "FileWriteOptions");
}
// [[arrow::export]]
-std::shared_ptr<ds::ParquetFileFormat> dataset___ParquetFileFormat__Make(
- bool use_buffered_stream, int64_t buffer_size, cpp11::strings
dict_columns) {
+R6 dataset___ParquetFileFormat__MakeRead(bool use_buffered_stream, int64_t
buffer_size,
Review comment:
```suggestion
R6 dataset___ParquetFileFormat__Make(bool use_buffered_stream, int64_t
buffer_size,
```
##########
File path: r/R/dataset-format.R
##########
@@ -98,9 +84,16 @@ as.character.FileFormat <- function(x, ...) {
ParquetFileFormat <- R6Class("ParquetFileFormat", inherit = FileFormat)
ParquetFileFormat$create <- function(use_buffered_stream = FALSE,
buffer_size = 8196,
- dict_columns = character(0)) {
- shared_ptr(ParquetFileFormat, dataset___ParquetFileFormat__Make(
- use_buffered_stream, buffer_size, dict_columns))
+ dict_columns = character(0),
+ writer_properties = NULL,
+ arrow_writer_properties = NULL) {
+ if (is.null(writer_properties) && is.null(arrow_writer_properties)) {
+ dataset___ParquetFileFormat__MakeRead(use_buffered_stream, buffer_size,
dict_columns)
+ } else {
+ writer_properties = writer_properties %||% ParquetWriterProperties$create()
+ arrow_writer_properties = arrow_writer_properties %||%
ParquetArrowWriterProperties$create()
+ dataset___ParquetFileFormat__MakeWrite(writer_properties,
arrow_writer_properties)
+ }
Review comment:
We pulled the writer properties out of ParquetFileFormat
```suggestion
dict_columns = character(0)) {
dataset___ParquetFileFormat__Make(use_buffered_stream, buffer_size,
dict_columns)
```
##########
File path: r/R/dataset.R
##########
@@ -145,29 +145,24 @@ open_dataset <- function(sources,
#' @seealso [open_dataset()] for a simple interface to creating a `Dataset`
Dataset <- R6Class("Dataset", inherit = ArrowObject,
public = list(
- ..dispatch = function() {
- type <- self$type
- if (type == "union") {
- shared_ptr(UnionDataset, self$pointer())
- } else if (type == "filesystem") {
- shared_ptr(FileSystemDataset, self$pointer())
- } else {
- self
- }
- },
# @description
# Start a new scan of the data
# @return A [ScannerBuilder]
- NewScan = function() unique_ptr(ScannerBuilder,
dataset___Dataset__NewScan(self)),
- ToString = function() self$schema$ToString()
+ NewScan = function() dataset___Dataset__NewScan(self),
+ ToString = function() self$schema$ToString(),
+ write = function(path, filesystem = NULL, schema = self$schema, format,
partitioning, ...) {
+ path_and_fs <- get_path_and_filesystem(path, filesystem)
+ dataset___Dataset__Write(self, schema, format, path_and_fs$fs,
path_and_fs$path, partitioning)
+ invisible(self)
+ }
Review comment:
Another removal from the latest dataset writing patch
```suggestion
ToString = function() self$schema$ToString()
```
##########
File path: r/R/type.R
##########
@@ -37,51 +37,18 @@ DataType <- R6Class("DataType",
Equals = function(other, ...) {
inherits(other, "DataType") && DataType__Equals(self, other)
},
+ num_fields = function() {
+ DataType__num_fields(self)
+ },
num_children = function() {
- DataType__num_children(self)
+ DataType__num_fields(self)
},
children = function() {
- map(DataType__children_pointer(self), shared_ptr, class = Field)
+ # TODO: this is deprecated
+ DataType__fields(self)
},
Review comment:
If these are deprecated, we should just delete them. I think these are
sufficiently buried (we haven't promoted any workflows that get into the
methods of DataType objects, and these aren't documented) that they're safe to
remove.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]