Colin Jermain created ARROW-15982:
-------------------------------------
Summary: [Python] parquet.read_table fails to parse home directory
path
Key: ARROW-15982
URL: https://issues.apache.org/jira/browse/ARROW-15982
Project: Apache Arrow
Issue Type: Bug
Affects Versions: 7.0.0
Reporter: Colin Jermain
{{pyarrow.parquet.read_table}} fails to parse a path with the home directory in
it. For example {{"~/test.parquet"}} returns a {{{}FileNotFoundError{}}}, while
{{"/home/user/test.parquet"}} reads the file correctly.
{code:java}
$ python -c "import pyarrow.parquet;
pyarrow.parquet.read_table('~/test.parquet')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File ".../lib/python3.8/site-packages/pyarrow/parquet.py", line 1960, in
read_table
dataset = _ParquetDatasetV2(
File ".../lib/python3.8/site-packages/pyarrow/parquet.py", line 1781, in
__init__
self._dataset = ds.dataset(path_or_paths, filesystem=filesystem,
File ".../lib/python3.8/site-packages/pyarrow/dataset.py", line 667, in
dataset
return _filesystem_dataset(source, **kwargs)
File ".../lib/python3.8/site-packages/pyarrow/dataset.py", line 412, in
_filesystem_dataset
fs, paths_or_selector = _ensure_single_source(source, filesystem)
File ".../lib/python3.8/site-packages/pyarrow/dataset.py", line 388, in
_ensure_single_source
raise FileNotFoundError(path)
FileNotFoundError: ~/test.parquet
{code}
The fix for this issue should be as simple as applying {{os.path.expanduser}}
in the right places.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)