nealrichardson commented on a change in pull request #11346: URL: https://github.com/apache/arrow/pull/11346#discussion_r725019601
########## File path: r/R/dataset.R ########## @@ -157,7 +158,27 @@ open_dataset <- function(sources, return(dataset___UnionDataset__create(sources, schema)) } - factory <- DatasetFactory$create(sources, partitioning = partitioning, format = format, ...) + is_csv_format <- function(format) { + length(format) == 1 && "CsvFileFormat" %in% class(format) || is_string(format, string = "csv") + } + + if (is_csv_format(format) && !is.null(schema)) { + factory <- DatasetFactory$create( + sources, + partitioning = partitioning, + format = format, + ..., + read_options = CsvReadOptions$create(column_names = names(schema)) + ) + } else { + factory <- DatasetFactory$create( + sources, + partitioning = partitioning, + format = format, + ... + ) + } + Review comment: How about this? DatasetFactory$create passes things down to FileFormat$create. Add a `schema = NULL` argument to the signature for `FileFormat$create` so that it's pulled out of `...` there, and then pass it to CsvFileFormat$create, which will pass it to `csv_file_format_read_opts`, which is _really_ where the business happens. Aside from the aesthetics, this will help solve a case that I think would be broken by your change, which is if I've passed any other read options in `...` ```suggestion factory <- DatasetFactory$create(sources, partitioning = partitioning, format = format, schema = schema, ...)``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org