thisisnic commented on a change in pull request #10326:
URL: https://github.com/apache/arrow/pull/10326#discussion_r634743196
##########
File path: r/R/dataset.R
##########
@@ -93,8 +93,19 @@ open_dataset <- function(sources,
return(dataset___UnionDataset__create(sources, schema))
}
factory <- DatasetFactory$create(sources, partitioning = partitioning, ...)
- # Default is _not_ to inspect/unify schemas
- factory$Finish(schema, isTRUE(unify_schemas))
+
+ tryCatch(
+ # Default is _not_ to inspect/unify schemas
+ factory$Finish(schema, isTRUE(unify_schemas)),
+ error = function (e) {
+ msg <- conditionMessage(e)
+ if(grep("Parquet magic bytes not found in footer", msg)){
+ stop("Looks like these are not parquet files, did you mean to specify
a 'format'?", call. = FALSE)
Review comment:
I can understand where you're coming from @westonpace - I definitely
don't think we should necessarily have error translation for every single C++
error in the package or make that into a pattern we use.
Just to provide a little more context, the reason for opening the original
ticket was that when there is a call to `open_dataset()`, if the format is
anything other than "parquet", the user needs to pass that format through as an
optional argument, but it's not immediately clear from looking at the docs for
`open_dataset()` that this is the case. I'm going to add some examples to the
documentation that show how to open files of different formats with
`open_dataset()`, but the info in the error is intended as an extra hint in
case the behaviour is unintuitive.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]