Joris Van den Bossche created ARROW-8136: --------------------------------------------
Summary: [C++][Python] Creating dataset from relative path no longer working Key: ARROW-8136 URL: https://issues.apache.org/jira/browse/ARROW-8136 Project: Apache Arrow Issue Type: Bug Components: C++, Python Reporter: Joris Van den Bossche Fix For: 0.17.0 Since https://github.com/apache/arrow/pull/6597, local relative paths don't work anymore: {code} In [1]: import pyarrow.dataset as ds In [2]: ds.dataset("test.parquet") --------------------------------------------------------------------------- ArrowInvalid Traceback (most recent call last) <ipython-input-2-23ecfce52d13> in <module> ----> 1 ds.dataset("test.parquet") ~/scipy/repos/arrow/python/pyarrow/dataset.py in dataset(paths_or_factories, filesystem, partitioning, format) 327 328 if isinstance(paths_or_factories, str): --> 329 return factory(paths_or_factories, **kwargs).finish() 330 331 if not isinstance(paths_or_factories, list): ~/scipy/repos/arrow/python/pyarrow/dataset.py in factory(path_or_paths, filesystem, partitioning, format) 246 factories = [] 247 for path in path_or_paths: --> 248 fs, paths_or_selector = _ensure_fs_and_paths(path, filesystem) 249 factories.append(FileSystemDatasetFactory(fs, paths_or_selector, 250 format, options)) ~/scipy/repos/arrow/python/pyarrow/dataset.py in _ensure_fs_and_paths(path, filesystem) 165 from pyarrow.fs import FileType, FileSelector 166 --> 167 filesystem, path = _ensure_fs(filesystem, _stringify_path(path)) 168 infos = filesystem.get_target_infos([path])[0] 169 if infos.type == FileType.Directory: ~/scipy/repos/arrow/python/pyarrow/dataset.py in _ensure_fs(filesystem, path) 158 if filesystem is not None: 159 return filesystem, path --> 160 return FileSystem.from_uri(path) 161 162 ~/scipy/repos/arrow/python/pyarrow/_fs.pyx in pyarrow._fs.FileSystem.from_uri() ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status() ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowInvalid: URI has empty scheme: 'test.parquet' {code} [~apitrou] Is this something that should be fixed in {{FileSystemFromUriOrPath}} or rather on the python side? ({{FileSystem.from_uri}} ensures to get the absolute path for Pathlib objects, but not for strings) -- This message was sent by Atlassian Jira (v8.3.4#803005)