[GitHub] [arrow] toppyy commented on pull request #12083: ARROW-14744: [R] open_dataset() error when `schema` argument supplied, but `column_names` not supplied to `CSVReadOptions`

GitBox Thu, 13 Jan 2022 10:12:32 -0800


toppyy commented on pull request #12083:
URL: https://github.com/apache/arrow/pull/12083#issuecomment-1012386125



   I think that makes a lot sense! Trying to infer the column names from schema 
added (a surprising) amount of complexity to the code for little to none added 
value for the user. As you say, they can pass the arguments using a different 
approach. 
   
   Just to make sure I got it right before I jump into making changes, the 
approach we're taking is this:
   Instead of deriving the column_names from the schema, we raise an error if 
`CsvReadOptions$create()` is used for read_options and it has no column names 
or they differ from the schema?
   
   If so, could the check for column names be inside `CsvFileFormat$create`? As 
the issue is limited to formats that use this function (not relevant for 
parquet).
   
   I can write this  up in the documentation as a part of this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] toppyy commented on pull request #12083: ARROW-14744: [R] open_dataset() error when `schema` argument supplied, but `column_names` not supplied to `CSVReadOptions`

Reply via email to