Joris Van den Bossche created ARROW-8136:
--------------------------------------------

             Summary: [C++][Python] Creating dataset from relative path no 
longer working
                 Key: ARROW-8136
                 URL: https://issues.apache.org/jira/browse/ARROW-8136
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
            Reporter: Joris Van den Bossche
             Fix For: 0.17.0


Since https://github.com/apache/arrow/pull/6597, local relative paths don't 
work anymore:

{code}
In [1]: import pyarrow.dataset as ds  

In [2]: ds.dataset("test.parquet")  
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-2-23ecfce52d13> in <module>
----> 1 ds.dataset("test.parquet")

~/scipy/repos/arrow/python/pyarrow/dataset.py in dataset(paths_or_factories, 
filesystem, partitioning, format)
    327 
    328     if isinstance(paths_or_factories, str):
--> 329         return factory(paths_or_factories, **kwargs).finish()
    330 
    331     if not isinstance(paths_or_factories, list):

~/scipy/repos/arrow/python/pyarrow/dataset.py in factory(path_or_paths, 
filesystem, partitioning, format)
    246     factories = []
    247     for path in path_or_paths:
--> 248         fs, paths_or_selector = _ensure_fs_and_paths(path, filesystem)
    249         factories.append(FileSystemDatasetFactory(fs, paths_or_selector,
    250                                                   format, options))

~/scipy/repos/arrow/python/pyarrow/dataset.py in _ensure_fs_and_paths(path, 
filesystem)
    165     from pyarrow.fs import FileType, FileSelector
    166 
--> 167     filesystem, path = _ensure_fs(filesystem, _stringify_path(path))
    168     infos = filesystem.get_target_infos([path])[0]
    169     if infos.type == FileType.Directory:

~/scipy/repos/arrow/python/pyarrow/dataset.py in _ensure_fs(filesystem, path)
    158     if filesystem is not None:
    159         return filesystem, path
--> 160     return FileSystem.from_uri(path)
    161 
    162 

~/scipy/repos/arrow/python/pyarrow/_fs.pyx in pyarrow._fs.FileSystem.from_uri()

~/scipy/repos/arrow/python/pyarrow/error.pxi in 
pyarrow.lib.pyarrow_internal_check_status()

~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: URI has empty scheme: 'test.parquet'

{code}

[~apitrou] Is this something that should be fixed in 
{{FileSystemFromUriOrPath}} or rather on the python side? 
({{FileSystem.from_uri}} ensures to get the absolute path for Pathlib objects, 
but not for strings)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to