Fabian Höring created ARROW-7584:
------------------------------------

             Summary: [Python] Improve ergonomics of new FileSystem API
                 Key: ARROW-7584
                 URL: https://issues.apache.org/jira/browse/ARROW-7584
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Fabian Höring


The [new Python FileSystem API 
|https://github.com/apache/arrow/blob/master/python/pyarrow/_fs.pyx#L185] is 
nice but seems to be very verbose to use.

The documentation of the old FS API is 
[here|https://arrow.apache.org/docs/python/filesystems.html]

Here are some examples:

*File access:*

Before:
fs.ls()
fs.mkdir()
fs.rmdir()

Now:
fs.get_target_stats()
fs.create_dir()
fs.delete_dir()

What is the advantage of having a longer method ? The short ones seems clear 
and are much easier to use. Seems like an easy change.  Also this is consistent 
with what is doing hdfs in the [fs api| 
https://arrow.apache.org/docs/python/filesystems.html] and works naturally with 
a local filesystem.

*File opening:*

Before:
with fs.open(self, path, mode=u'rb', buffer_size=None)

Now:
fs.open_input_file()
fs.open_input_stream()
fs.open_output_stream()

It seems more natural to fit to Python standard open function which works for 
local file access as well. Not sure if this is possible to doeasily. as there 
is `_wrap_output_stream`

Solutions:
- If the current Python API is still unused we could just rename the methods
- We could everything as is and add some alias methods, it would make the 
FileSystem class a bit messy think if there are always 2 method to do the work

Other considerations:
In the long run I would also enhance the FileSystem API to add more methods 
that use the basic to provide new features for example:
- introduce put and get on top of the streams that directly upload/download 
files
- be able to write string to the file streams (instead of only bytes), it would 
permit to directly use some Python API's like json.dump

```
with fs.open(path, "wb") as fd:
  res = {"a": "bc"}
  json.dump(res, fd)
```

instead of

```
with fs.open(path, "wb") as fd:
  res = {"a": "bc"}
  fd.write(json.dumps(res)) # instead of fd.write(json.dumps(res).encode())
```

or like currently (with old API, untested with new one)

```
with fs.open(path, "wb") as fd:
  res = {"a": "bc"}
  fd.write(json.dumps(res).encode())
```




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to