eitsupi commented on code in PR #13677:
URL: https://github.com/apache/arrow/pull/13677#discussion_r956731736
##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -4192,27 +4192,27 @@ def test_write_table_multiple_fragments(tempdir):
# Table with multiple batches written as single Fragment by default
base_dir = tempdir / 'single'
ds.write_dataset(table, base_dir, format="feather")
- assert set(base_dir.rglob("*")) == set([base_dir / "part-0.feather"])
+ assert set(base_dir.rglob("*")) == set([base_dir / "part-0.arrow"])
assert ds.dataset(base_dir, format="ipc").to_table().equals(table)
# Same for single-element list of Table
base_dir = tempdir / 'single-list'
ds.write_dataset([table], base_dir, format="feather")
- assert set(base_dir.rglob("*")) == set([base_dir / "part-0.feather"])
+ assert set(base_dir.rglob("*")) == set([base_dir / "part-0.arrow"])
assert ds.dataset(base_dir, format="ipc").to_table().equals(table)
# Provide list of batches to write multiple fragments
base_dir = tempdir / 'multiple'
ds.write_dataset(table.to_batches(), base_dir, format="feather")
assert set(base_dir.rglob("*")) == set(
- [base_dir / "part-0.feather"])
+ [base_dir / "part-0.arrow"])
Review Comment:
> I think that specifying `format="feather"` explicitly means that "the user
wants to call the format Feather V2 not Apache Arrow IPC file format" because
the user can specify `format="ipc"` or `format="arrow"` instead of
`format="feather"`.
I don't know why there are multiple aliases set up here, but as long as this
project have treated Feather V2 as an alias for IPC files in the past (or still
do), I don't know if users are choosing `"feather"` for a clear reason.
For example, we use the `pyarrow.feather` module to handle IPC files with
pyarrow. (not `pyarrow.ipc` or `pyarrow.arrow`.)
> We may need to deprecate Feather V2. Could you start a discussion on the
`[email protected]` mailing list?
To be clear, I just think it makes more sense to use the extension `.arrow`
for Feather V2 files, not that I think the name "Feather V2" should be
discontinued.
I had assumed that such a policy was adopted once the official extension for
IPC files was decided to be `.arrow`, but apparently not?
(BTW, I was surprised at the time because I thought `.feather` was the
official extension until I saw [the
news](https://www.clear-code.com/blog/2022/5/13/latest-apache-arrow-information.html).)
For now, it may make sense here to leave the `.feather` extension for the
`"feather"` case, and warn in the future?
> I'm not sure where the auto-detection feature should be implemented
(Arrow.jl, Feather.jl or new library?) but how about create an issue to
[apache/arrow-julia](https://github.com/apache/arrow-julia) ?
Thanks for your suggestion, but Julia's case is an example, not that I want
that feature.
I just wanted to say that there may be no benefit in continuing to use the
`.feather` extension, which may not be readable by some libraries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]