Nic Crane created ARROW-12791:
---------------------------------

             Summary: [R] Better error handling for DataSetFactory$Finish() 
when schema is NULL
                 Key: ARROW-12791
                 URL: https://issues.apache.org/jira/browse/ARROW-12791
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Nic Crane


When I call the following code:

 
{code:java}
tf <- tempfile()
dir.create(tf)
on.exit(unlink(tf))
write_csv_arrow(mtcars[1:5,], file.path(tf, "file1.csv"))
write_csv_arrow(mtcars[6:11,], file.path(tf, "file2.csv"))
ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")))
{code}
I get the following error: 
{code:java}
 Error: IOError: Could not open parquet input source 
'/tmp/RtmpSug6P8/file714931976ac54/file1.csv': Invalid: Parquet magic bytes not 
found in footer. Either the file is corrupted or this is not a parquet file.
{code}
However, in the documentation for open_dataset(), there is nothing saying that 
the input source cannot be a CSV or must be a Parquet file.   

I think this is due to calling DataSetFactory$Finish() when schema is NULL and 
input files have no inherent schema (i.e. are CSVs).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to