[ 
https://issues.apache.org/jira/browse/ARROW-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128405#comment-17128405
 ] 

Francois Saint-Jacques commented on ARROW-7673:
-----------------------------------------------

This has been refactored in ARROW-8058:


{code:python}
In [40]: da.dataset("/home/fsaintjacques/datasets/nyc-tlc/csv/2016", 
format="csv")                                                           
Out[40]: <pyarrow._dataset.FileSystemDataset at 0x7fef446b2930>

In [41]: da.dataset("/home/fsaintjacques/datasets/nyc-tlc/csv/2016", 
format="parquet")                                                       
...
OSError: Could not open parquet input source 
'/home/fsaintjacques/datasets/nyc-tlc/csv/2016/01/data.csv': Invalid: Parquet 
magic bytes not found in footer. Either the file is corrupted or this is not a 
parquet file.

In [42]: da.dataset("/home/fsaintjacques/datasets/nyc-tlc/parquet/2016", 
format="parquet")                                                   
Out[42]: <pyarrow._dataset.FileSystemDataset at 0x7fef447ad7f0>

{code}


> [C++][Dataset] Revisit File discovery failure mode
> --------------------------------------------------
>
>                 Key: ARROW-7673
>                 URL: https://issues.apache.org/jira/browse/ARROW-7673
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Francois Saint-Jacques
>            Assignee: Francois Saint-Jacques
>            Priority: Major
>              Labels: dataset
>             Fix For: 1.0.0
>
>
> Currently, the default `FileSystemFactoryOptions::exclude_invalid_files` will 
> silently ignore unsupported files (either IO error, not of the valid format, 
> corruption, missing compression codecs, etc...) when creating a 
> `FileSystemSource`.
> We should change this behavior to propagate an error in the Inspect/Finish 
> calls by default and allow the user to toggle `exclude_invalid_files`. The 
> error should contain at least the file path and a decipherable error (if 
> possible).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to