thisisnic commented on a change in pull request #10326:
URL: https://github.com/apache/arrow/pull/10326#discussion_r638786673
##########
File path: r/R/util.R
##########
@@ -110,3 +110,12 @@ handle_embedded_nul_error <- function(e) {
}
stop(e)
}
+
+handle_parquet_io_error <- function(e, format) {
+ msg <- conditionMessage(e)
+ if (grepl("Parquet magic bytes not found in footer", msg) &&
is.null(format)) {
+ e$message <- paste0(msg, "\nDid you mean to specify a 'format' other
than the default (parquet)?")
+ }
+ stop(e)
Review comment:
That sounds like a good idea, but is it complicated by the fact that
`open_dataset` can also take `Dataset` objects for the parameter `sources` and
not just files? If we were to add `format` as a parameter, if we keep the
default value of `parquet`, does that feel weird if we're working with
`Dataset` objects and not files? (Or could we just do it and mention in the
docs?) And if we change `format` to be `NULL` by default, then that's quite a
big breaking change, I guess.
I'm not sure what would be a good solution here or what the convention is.
What are your thoughts, @romainfrancois ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]