[jira] [Commented] (ARROW-12791) [R] Better error handling for DatasetFactory$Finish() when schema is NULL

Neal Richardson (Jira) Fri, 14 May 2021 10:29:08 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-12791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344779#comment-17344779
 ]


Neal Richardson commented on ARROW-12791:
-----------------------------------------

You could try to catch the "Parquet magic bytes not found in footer" error 
message inside open_dataset() and return a different/helpful message like 
"Looks like these are not parquet files, did you mean to specify a 'format'?" 
or something.

> [R] Better error handling for DatasetFactory$Finish() when schema is NULL
> -------------------------------------------------------------------------
>
>                 Key: ARROW-12791
>                 URL: https://issues.apache.org/jira/browse/ARROW-12791
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Nic Crane
>            Priority: Major
>
> When I call the following code:
>  
> {code:java}
> tf <- tempfile()
> dir.create(tf)
> on.exit(unlink(tf))
> write_csv_arrow(mtcars[1:5,], file.path(tf, "file1.csv"))
> write_csv_arrow(mtcars[6:11,], file.path(tf, "file2.csv"))
> ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")))
> {code}
> I get the following error: 
> {code:java}
>  Error: IOError: Could not open parquet input source 
> '/tmp/RtmpSug6P8/file714931976ac54/file1.csv': Invalid: Parquet magic bytes 
> not found in footer. Either the file is corrupted or this is not a parquet 
> file.
> {code}
> However, in the documentation for open_dataset(), there is nothing saying 
> that the input source cannot be a CSV or must be a Parquet file.   
> I think this is due to calling DataSetFactory$Finish() when schema is NULL 
> and input files have no inherent schema (i.e. are CSVs).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-12791) [R] Better error handling for DatasetFactory$Finish() when schema is NULL

Reply via email to