[GitHub] [arrow] nealrichardson commented on a change in pull request #11346: ARROW-14063: [R] open_dataset() does not work on CSVs without header rows

GitBox Fri, 08 Oct 2021 06:38:00 -0700


nealrichardson commented on a change in pull request #11346:
URL: https://github.com/apache/arrow/pull/11346#discussion_r725019601




##########
File path: r/R/dataset.R
##########
@@ -157,7 +158,27 @@ open_dataset <- function(sources,
     return(dataset___UnionDataset__create(sources, schema))
   }
 
-  factory <- DatasetFactory$create(sources, partitioning = partitioning, 
format = format, ...)
+  is_csv_format <- function(format) {
+    length(format) == 1 && "CsvFileFormat" %in% class(format) || 
is_string(format, string = "csv")
+  }
+
+  if (is_csv_format(format) && !is.null(schema)) {
+    factory <- DatasetFactory$create(
+      sources,
+      partitioning = partitioning,
+      format = format,
+      ...,
+      read_options = CsvReadOptions$create(column_names = names(schema))
+    )
+  } else {
+    factory <- DatasetFactory$create(
+      sources,
+      partitioning = partitioning,
+      format = format,
+      ...
+    )
+  }
+

Review comment:
       How about this? DatasetFactory$create passes things down to 
FileFormat$create. Add a `schema = NULL` argument to the signature for 
`FileFormat$create` so that it's pulled out of `...` there, and then pass it to 
CsvFileFormat$create, which will pass it to `csv_file_format_read_opts`, which 
is _really_ where the business happens. 
   
   Aside from the aesthetics, this will help solve a case that I think would be 
broken by your change, which is if I've passed any other read options in `...`
   
   ```suggestion
     factory <- DatasetFactory$create(sources, partitioning = partitioning, 
format = format, schema = schema, ...)```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] nealrichardson commented on a change in pull request #11346: ARROW-14063: [R] open_dataset() does not work on CSVs without header rows

Reply via email to