Joris Van den Bossche created ARROW-8136:
--------------------------------------------
Summary: [C++][Python] Creating dataset from relative path no
longer working
Key: ARROW-8136
URL: https://issues.apache.org/jira/browse/ARROW-8136
Project: Apache Arrow
Issue Type: Bug
Components: C++, Python
Reporter: Joris Van den Bossche
Fix For: 0.17.0
Since https://github.com/apache/arrow/pull/6597, local relative paths don't
work anymore:
{code}
In [1]: import pyarrow.dataset as ds
In [2]: ds.dataset("test.parquet")
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
<ipython-input-2-23ecfce52d13> in <module>
----> 1 ds.dataset("test.parquet")
~/scipy/repos/arrow/python/pyarrow/dataset.py in dataset(paths_or_factories,
filesystem, partitioning, format)
327
328 if isinstance(paths_or_factories, str):
--> 329 return factory(paths_or_factories, **kwargs).finish()
330
331 if not isinstance(paths_or_factories, list):
~/scipy/repos/arrow/python/pyarrow/dataset.py in factory(path_or_paths,
filesystem, partitioning, format)
246 factories = []
247 for path in path_or_paths:
--> 248 fs, paths_or_selector = _ensure_fs_and_paths(path, filesystem)
249 factories.append(FileSystemDatasetFactory(fs, paths_or_selector,
250 format, options))
~/scipy/repos/arrow/python/pyarrow/dataset.py in _ensure_fs_and_paths(path,
filesystem)
165 from pyarrow.fs import FileType, FileSelector
166
--> 167 filesystem, path = _ensure_fs(filesystem, _stringify_path(path))
168 infos = filesystem.get_target_infos([path])[0]
169 if infos.type == FileType.Directory:
~/scipy/repos/arrow/python/pyarrow/dataset.py in _ensure_fs(filesystem, path)
158 if filesystem is not None:
159 return filesystem, path
--> 160 return FileSystem.from_uri(path)
161
162
~/scipy/repos/arrow/python/pyarrow/_fs.pyx in pyarrow._fs.FileSystem.from_uri()
~/scipy/repos/arrow/python/pyarrow/error.pxi in
pyarrow.lib.pyarrow_internal_check_status()
~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowInvalid: URI has empty scheme: 'test.parquet'
{code}
[~apitrou] Is this something that should be fixed in
{{FileSystemFromUriOrPath}} or rather on the python side?
({{FileSystem.from_uri}} ensures to get the absolute path for Pathlib objects,
but not for strings)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)