lidavidm commented on issue #12469:
URL: https://github.com/apache/arrow/issues/12469#issuecomment-1046287019
@dhicks if you happen to have a reprex for the crash, it would be much
appreciated - compute() shouldn't crash on local files. (Is it for the same
file you've attached?)
Also, what version of Arrow are you using? (`arrow_info()` would give you
this.)
You can use `col_types` or `skip_rows`. The documentation is a little
unclear on this point, but the acceptable option names actually come from
CsvReadOptions/CsvParseOptions/CsvConvertOptions. For example:
```r
> open_dataset('./temp/1960-1-01.csv', format='csv',
schema=schema(article_id=string(), phrase=string(), n=int32())) %>% collect()
Error: Invalid: Could not open CSV input source
'/home/lidavidm/temp/1960-1-01.csv': Invalid: In CSV column #2: Row #1: CSV
conversion error to int32: invalid value 'n'
> open_dataset('./temp/1960-1-01.csv', format='csv',
schema=schema(article_id=string(), phrase=string(), n=int32()), skip_rows=1)
%>% collect()
# A tibble: 452 × 3
article_id phrase n
<chr> <chr> <int>
1 1960-1-01 it 63
2 1960-1-01 we 24
3 1960-1-01 world 13
4 1960-1-01 numbers 11
5 1960-1-01 they 11
6 1960-1-01 our_numbers 10
7 1960-1-01 he 9
8 1960-1-01 life 9
9 1960-1-01 i 8
10 1960-1-01 mankind 7
# … with 442 more rows
> open_dataset('./temp/1960-1-01.csv', format='csv',
col_types=schema(article_id=string(), phrase=string(), n=int32())) %>% collect()
# A tibble: 452 × 3
article_id phrase n
<chr> <chr> <int>
1 1960-1-01 it 63
2 1960-1-01 we 24
3 1960-1-01 world 13
4 1960-1-01 numbers 11
5 1960-1-01 they 11
6 1960-1-01 our_numbers 10
7 1960-1-01 he 9
8 1960-1-01 life 9
9 1960-1-01 i 8
10 1960-1-01 mankind 7
# … with 442 more rows
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]