[GitHub] [arrow] thisisnic commented on a change in pull request #10326: ARROW-12791: [R] Better error handling for DatasetFactory$Finish() when no format specified

GitBox Tue, 25 May 2021 00:28:41 -0700


thisisnic commented on a change in pull request #10326:
URL: https://github.com/apache/arrow/pull/10326#discussion_r638526031




##########
File path: r/R/dataset.R
##########
@@ -93,8 +93,11 @@ open_dataset <- function(sources,
     return(dataset___UnionDataset__create(sources, schema))
   }
   factory <- DatasetFactory$create(sources, partitioning = partitioning, ...)
-  # Default is _not_ to inspect/unify schemas
-  factory$Finish(schema, isTRUE(unify_schemas))
+  tryCatch(
+    # Default is _not_ to inspect/unify schemas
+    factory$Finish(schema, isTRUE(unify_schemas)),
+    error = handle_parquet_io_error

Review comment:
       I like this suggestion as it helps make the error message a lot more 
specific.  I've now updated the function so that it gives your updated error 
message (i.e. mentioning that parquet is the default) if the format is NULL 
(i.e. hasn't been specified by the user).  
   
   I've tested the code with a directory containing a mix of file formats and 
the C++ error looks like this: 
   `Error: IOError: Could not open parquet input source 
'/tmp/RtmpLi0d7E/filefa0e499bb459/file1.txt': Invalid: Parquet magic bytes not 
found in footer. Either the file is corrupted or this is not a parquet file.`
   
   Given that the C++ error message above is pretty informative - it mentions 
both the name of the file which has caused the error and the likely source of 
the error - I haven't implemented anything that changes the error output in 
this case, as I think this is sufficient.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] thisisnic commented on a change in pull request #10326: ARROW-12791: [R] Better error handling for DatasetFactory$Finish() when no format specified

Reply via email to