lidavidm commented on issue #12469:
URL: https://github.com/apache/arrow/issues/12469#issuecomment-1046287019


   @dhicks if you happen to have a reprex for the crash, it would be much 
appreciated - compute() shouldn't crash on local files. (Is it for the same 
file you've attached?)
   
   Also, what version of Arrow are you using? (`arrow_info()` would give you 
this.)
   
   You can use `col_types` or `skip_rows`. The documentation is a little 
unclear on this point, but the acceptable option names actually come from 
CsvReadOptions/CsvParseOptions/CsvConvertOptions. For example:
   
   ```r
   > open_dataset('./temp/1960-1-01.csv', format='csv', 
schema=schema(article_id=string(), phrase=string(), n=int32())) %>% collect()
   Error: Invalid: Could not open CSV input source 
'/home/lidavidm/temp/1960-1-01.csv': Invalid: In CSV column #2: Row #1: CSV 
conversion error to int32: invalid value 'n'
   > open_dataset('./temp/1960-1-01.csv', format='csv', 
schema=schema(article_id=string(), phrase=string(), n=int32()), skip_rows=1) 
%>% collect()
   # A tibble: 452 × 3
      article_id phrase          n
      <chr>      <chr>       <int>
    1 1960-1-01  it             63
    2 1960-1-01  we             24
    3 1960-1-01  world          13
    4 1960-1-01  numbers        11
    5 1960-1-01  they           11
    6 1960-1-01  our_numbers    10
    7 1960-1-01  he              9
    8 1960-1-01  life            9
    9 1960-1-01  i               8
   10 1960-1-01  mankind         7
   # … with 442 more rows
   > open_dataset('./temp/1960-1-01.csv', format='csv', 
col_types=schema(article_id=string(), phrase=string(), n=int32())) %>% collect()
   # A tibble: 452 × 3
      article_id phrase          n
      <chr>      <chr>       <int>
    1 1960-1-01  it             63
    2 1960-1-01  we             24
    3 1960-1-01  world          13
    4 1960-1-01  numbers        11
    5 1960-1-01  they           11
    6 1960-1-01  our_numbers    10
    7 1960-1-01  he              9
    8 1960-1-01  life            9
    9 1960-1-01  i               8
   10 1960-1-01  mankind         7
   # … with 442 more rows
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to