[GitHub] [arrow] toppyy commented on pull request #12083: ARROW-14744: [R] open_dataset() error when `schema` argument supplied, but `column_names` not supplied to `CSVReadOptions`

GitBox Sun, 23 Jan 2022 01:33:32 -0800


toppyy commented on pull request #12083:
URL: https://github.com/apache/arrow/pull/12083#issuecomment-1019446733



   I refactored/simplified the column_names vs. schema-names comparison a bit.
   
   While doing this, I realized that the solution did not solve the original 
issue. I was comparing column names versus schema only when both are set. 
However, the original issue presented itself when read_options was set 
_without_ column_names. Like so:
   `open_dataset(
     td,
     format='csv',
     schema = diamond_schema,
     skip_rows = 1,
     read_options=arrow::CsvReadOptions$create(
       skip_rows = 1
       # ..and no column_names
     )
   ) %>%  collect()`
   
   To raise an error in situation also, the code now checks that 
`identical(read_options$column_names,names(schema))` in all scenarios where 
schema is set. So the example would create an error despite the fact that 
`column_names` is not set at all. I added a hint for this situation in the 
error message ("Omit the read_options -argument"). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] toppyy commented on pull request #12083: ARROW-14744: [R] open_dataset() error when `schema` argument supplied, but `column_names` not supplied to `CSVReadOptions`

Reply via email to