Fabian Höring created ARROW-7584: ------------------------------------ Summary: [Python] Improve ergonomics of new FileSystem API Key: ARROW-7584 URL: https://issues.apache.org/jira/browse/ARROW-7584 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Fabian Höring
The [new Python FileSystem API |https://github.com/apache/arrow/blob/master/python/pyarrow/_fs.pyx#L185] is nice but seems to be very verbose to use. The documentation of the old FS API is [here|https://arrow.apache.org/docs/python/filesystems.html] Here are some examples: *File access:* Before: fs.ls() fs.mkdir() fs.rmdir() Now: fs.get_target_stats() fs.create_dir() fs.delete_dir() What is the advantage of having a longer method ? The short ones seems clear and are much easier to use. Seems like an easy change. Also this is consistent with what is doing hdfs in the [fs api| https://arrow.apache.org/docs/python/filesystems.html] and works naturally with a local filesystem. *File opening:* Before: with fs.open(self, path, mode=u'rb', buffer_size=None) Now: fs.open_input_file() fs.open_input_stream() fs.open_output_stream() It seems more natural to fit to Python standard open function which works for local file access as well. Not sure if this is possible to doeasily. as there is `_wrap_output_stream` Solutions: - If the current Python API is still unused we could just rename the methods - We could everything as is and add some alias methods, it would make the FileSystem class a bit messy think if there are always 2 method to do the work Other considerations: In the long run I would also enhance the FileSystem API to add more methods that use the basic to provide new features for example: - introduce put and get on top of the streams that directly upload/download files - be able to write string to the file streams (instead of only bytes), it would permit to directly use some Python API's like json.dump ``` with fs.open(path, "wb") as fd: res = {"a": "bc"} json.dump(res, fd) ``` instead of ``` with fs.open(path, "wb") as fd: res = {"a": "bc"} fd.write(json.dumps(res)) # instead of fd.write(json.dumps(res).encode()) ``` or like currently (with old API, untested with new one) ``` with fs.open(path, "wb") as fd: res = {"a": "bc"} fd.write(json.dumps(res).encode()) ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)