Ben Kietzman created ARROW-8058:
-----------------------------------
Summary: [C++][Python][Dataset] Provide an option to skip
validation in FileSystemDatasetFactoryOptions
Key: ARROW-8058
URL: https://issues.apache.org/jira/browse/ARROW-8058
Project: Apache Arrow
Issue Type: Improvement
Components: C++ - Dataset, Python
Affects Versions: 0.16.0
Reporter: Ben Kietzman
Assignee: Ben Kietzman
Fix For: 1.0.0
This can be costly and is not always necessary.
At the same time we could move file validation into the scan tasks; currently
all files are inspected as the dataset is constructed, which can be expensive
if the filesystem is slow. We'll be performing the validation multiple times
but the check will be cheap since at scan time we'll be reading the file into
memory anyway.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)