[
https://issues.apache.org/jira/browse/ARROW-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ben Kietzman reassigned ARROW-8136:
-----------------------------------
Assignee: Joris Van den Bossche
> [C++][Python] Creating dataset from relative path no longer working
> -------------------------------------------------------------------
>
> Key: ARROW-8136
> URL: https://issues.apache.org/jira/browse/ARROW-8136
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Reporter: Joris Van den Bossche
> Assignee: Joris Van den Bossche
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.17.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Since https://github.com/apache/arrow/pull/6597, local relative paths don't
> work anymore:
> {code}
> In [1]: import pyarrow.dataset as ds
> In [2]: ds.dataset("test.parquet")
> ---------------------------------------------------------------------------
> ArrowInvalid Traceback (most recent call last)
> <ipython-input-2-23ecfce52d13> in <module>
> ----> 1 ds.dataset("test.parquet")
> ~/scipy/repos/arrow/python/pyarrow/dataset.py in dataset(paths_or_factories,
> filesystem, partitioning, format)
> 327
> 328 if isinstance(paths_or_factories, str):
> --> 329 return factory(paths_or_factories, **kwargs).finish()
> 330
> 331 if not isinstance(paths_or_factories, list):
> ~/scipy/repos/arrow/python/pyarrow/dataset.py in factory(path_or_paths,
> filesystem, partitioning, format)
> 246 factories = []
> 247 for path in path_or_paths:
> --> 248 fs, paths_or_selector = _ensure_fs_and_paths(path, filesystem)
> 249 factories.append(FileSystemDatasetFactory(fs,
> paths_or_selector,
> 250 format, options))
> ~/scipy/repos/arrow/python/pyarrow/dataset.py in _ensure_fs_and_paths(path,
> filesystem)
> 165 from pyarrow.fs import FileType, FileSelector
> 166
> --> 167 filesystem, path = _ensure_fs(filesystem, _stringify_path(path))
> 168 infos = filesystem.get_target_infos([path])[0]
> 169 if infos.type == FileType.Directory:
> ~/scipy/repos/arrow/python/pyarrow/dataset.py in _ensure_fs(filesystem, path)
> 158 if filesystem is not None:
> 159 return filesystem, path
> --> 160 return FileSystem.from_uri(path)
> 161
> 162
> ~/scipy/repos/arrow/python/pyarrow/_fs.pyx in
> pyarrow._fs.FileSystem.from_uri()
> ~/scipy/repos/arrow/python/pyarrow/error.pxi in
> pyarrow.lib.pyarrow_internal_check_status()
> ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: URI has empty scheme: 'test.parquet'
> {code}
> [~apitrou] Is this something that should be fixed in
> {{FileSystemFromUriOrPath}} or rather on the python side?
> ({{FileSystem.from_uri}} ensures to get the absolute path for Pathlib
> objects, but not for strings)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)