[GitHub] [arrow] thisisnic edited a comment on pull request #12083: ARROW-14744: [R] open_dataset() error when `schema` argument supplied, but `column_names` not supplied to `CSVReadOptions`

GitBox Wed, 12 Jan 2022 08:25:22 -0800


thisisnic edited a comment on pull request #12083:
URL: https://github.com/apache/arrow/pull/12083#issuecomment-1011224682



   Thanks for the updates here @toppyy .  I've taken the time to have a proper 
think about this, and on reflection, I don't think we need to make 
`open_dataset( td, format = 'csv', read_options = CsvReadOptions$create( 
skip_rows = 1 ))` work for users as they can pass in their `skip_rows` 
parameter this way: `open_dataset( td, format = 'csv', skip_rows = 1)`.
   
   Directly using `CsvReadOptions$create()` is pretty low-level and so here we 
can probably assume that someone using it is responsible for making sure things 
match up themselves (though should absolutely add further documentation to show 
our recommended way that users work with `open_dataset()` so it's clear - 
perhaps that could be part of this PR if you're interested but no worries if 
not).
   
   I feel an alternative solution here might be just to check if there are 
conflicting arguments specified to `open_dataset()` (e.g. through specifying 
the `read_options` argument in the ellipses as well as individual read 
options).  It might be something along the lines of adding validation at the 
end of open_dataset so that if `(!is.null(schema))` and format is csv, ensure 
that `identical(names(schema), read_options$column_names)` or raise an error.
   
   How does that sound?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] thisisnic edited a comment on pull request #12083: ARROW-14744: [R] open_dataset() error when `schema` argument supplied, but `column_names` not supplied to `CSVReadOptions`

Reply via email to