Joris Van den Bossche created ARROW-8652: --------------------------------------------
Summary: [Python] Test error message when discovering dataset with invalid files Key: ARROW-8652 URL: https://issues.apache.org/jira/browse/ARROW-8652 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Joris Van den Bossche There is comment in the test_parquet.py about the Dataset API needing a better error message for invalid files: https://github.com/apache/arrow/blob/ff92a6886ca77515173a50662a1949a792881222/python/pyarrow/tests/test_parquet.py#L3633-L3648 Although, this seems to work now: {code} import tempfile import pathlib import pyarrow.dataset as ds tempdir = pathlib.Path(tempfile.mkdtemp()) with open(str(tempdir / "data.parquet"), 'wb') as f: pass In [10]: ds.dataset(str(tempdir / "data.parquet"), format="parquet") ... OSError: Could not open parquet input source '/tmp/tmp312vtjmw/data.parquet': Invalid: Parquet file size is 0 bytes {code} So we need update the test to actually test it instead of skipping. The only difference with the python ParquetDataset implementation is that the datasets API raises an OSError and not an ArrowInvalid error. -- This message was sent by Atlassian Jira (v8.3.4#803005)