thisisnic commented on a change in pull request #10326:
URL: https://github.com/apache/arrow/pull/10326#discussion_r638526031
##########
File path: r/R/dataset.R
##########
@@ -93,8 +93,11 @@ open_dataset <- function(sources,
return(dataset___UnionDataset__create(sources, schema))
}
factory <- DatasetFactory$create(sources, partitioning = partitioning, ...)
- # Default is _not_ to inspect/unify schemas
- factory$Finish(schema, isTRUE(unify_schemas))
+ tryCatch(
+ # Default is _not_ to inspect/unify schemas
+ factory$Finish(schema, isTRUE(unify_schemas)),
+ error = handle_parquet_io_error
Review comment:
I like this suggestion as it helps make the error message a lot more
specific. I've now updated the function so that it gives your updated error
message (i.e. mentioning that parquet is the default) if the format is NULL
(i.e. hasn't been specified by the user).
I've tested the code with a directory containing a mix of file formats and
the C++ error looks like this:
`Error: IOError: Could not open parquet input source
'/tmp/RtmpLi0d7E/filefa0e499bb459/file1.txt': Invalid: Parquet magic bytes not
found in footer. Either the file is corrupted or this is not a parquet file.`
Given that the C++ error message above is pretty informative - it mentions
both the name of the file which has caused the error and the likely source of
the error - I haven't implemented anything that changes the error output in
this case, as I think this is sufficient.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]