[GitHub] [arrow] kou commented on a diff in pull request #13677: ARROW-17089: [Python] Use `.arrow` as extension for IPC file dataset

GitBox Mon, 29 Aug 2022 01:21:18 -0700


kou commented on code in PR #13677:
URL: https://github.com/apache/arrow/pull/13677#discussion_r957026581



##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -4192,27 +4192,27 @@ def test_write_table_multiple_fragments(tempdir):
     # Table with multiple batches written as single Fragment by default
     base_dir = tempdir / 'single'
     ds.write_dataset(table, base_dir, format="feather")
-    assert set(base_dir.rglob("*")) == set([base_dir / "part-0.feather"])
+    assert set(base_dir.rglob("*")) == set([base_dir / "part-0.arrow"])
     assert ds.dataset(base_dir, format="ipc").to_table().equals(table)
 
     # Same for single-element list of Table
     base_dir = tempdir / 'single-list'
     ds.write_dataset([table], base_dir, format="feather")
-    assert set(base_dir.rglob("*")) == set([base_dir / "part-0.feather"])
+    assert set(base_dir.rglob("*")) == set([base_dir / "part-0.arrow"])
     assert ds.dataset(base_dir, format="ipc").to_table().equals(table)
 
     # Provide list of batches to write multiple fragments
     base_dir = tempdir / 'multiple'
     ds.write_dataset(table.to_batches(), base_dir, format="feather")
     assert set(base_dir.rglob("*")) == set(
-        [base_dir / "part-0.feather"])
+        [base_dir / "part-0.arrow"])

Review Comment:
   > For example, we use the `pyarrow.feather` module to handle IPC files with 
pyarrow. (not `pyarrow.ipc` or `pyarrow.arrow`.)
   
   No. Normally, users use `pyarrow.ipc.open_file()`/`pyarrow.ipc.new_file()` 
for it. See also:
   
   * 
https://arrow.apache.org/docs/python/ipc.html#writing-and-reading-random-access-files
   * https://arrow.apache.org/cookbook/py/io.html#saving-arrow-arrays-to-disk
   
   > For now, it may make sense here to leave the `.feather` extension for the 
`"feather"` case, and warn in the future?
   
   @westonpace What do you think about this?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] kou commented on a diff in pull request #13677: ARROW-17089: [Python] Use `.arrow` as extension for IPC file dataset

Reply via email to