jorisvandenbossche commented on code in PR #12821:
URL: https://github.com/apache/arrow/pull/12821#discussion_r851241763
##########
python/pyarrow/feather.py:
##########
@@ -151,7 +151,18 @@ def write_feather(df, dest, compression=None,
compression_level=None,
df = df.to_dense()
if _pandas_api.is_data_frame(df):
- table = Table.from_pandas(df, preserve_index=False)
+ """
+ Feather v1 creates a new column in the resultant Table to
+ store index information if index type is not RangeIndex
+ """
+ if version == 1 and type(df.index) is not _pandas_api.pd.RangeIndex:
+ preserve_index = False
Review Comment:
```suggestion
if version == 1:
preserve_index = False
```
I would maybe keep the current behaviour for `preserve_index=False` for all
cases when the user passes `version=1`. Since a user would only use this for
legacy reasons anyway, it seems better to not change the behaviour here.
##########
python/pyarrow/feather.py:
##########
@@ -151,7 +151,18 @@ def write_feather(df, dest, compression=None,
compression_level=None,
df = df.to_dense()
if _pandas_api.is_data_frame(df):
- table = Table.from_pandas(df, preserve_index=False)
+ """
+ Feather v1 creates a new column in the resultant Table to
+ store index information if index type is not RangeIndex
+ """
Review Comment:
```suggestion
# Feather v1 creates a new column in the resultant Table to
# store index information if index type is not RangeIndex
```
Inside a function, we typically use `# ` for comment strings, and only use
the triple quote for the function docstring (directly after the function `def`
line)
##########
python/pyarrow/tests/test_feather.py:
##########
@@ -820,3 +826,20 @@ def
test_feather_v017_experimental_compression_backward_compatibility(datadir):
expected = pa.table({'a': range(5)})
result = read_table(datadir / "v0.17.0.version.2-compression.lz4.feather")
assert result.equals(expected)
+
+
[email protected]
+def test_preserve_index_pandas(version):
+ data = {}
+ for i in range(4):
+ values = np.random.randint(0, 100, size=100)
+ data[i] = values
+
+ df = pd.DataFrame(data, index=data[0])
Review Comment:
```suggestion
df = pd.DataFrame({'a': [1, 2, 3]}, index=['a', 'b', 'c'])
```
A DataFrame like that should also cover the behaviour, and makes the test
code a bit simpler.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]