[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #12821: ARROW-7914: [Python] Allow pandas datetime as index for feather

GitBox Fri, 15 Apr 2022 05:26:44 -0700


jorisvandenbossche commented on code in PR #12821:
URL: https://github.com/apache/arrow/pull/12821#discussion_r851241763



##########
python/pyarrow/feather.py:
##########
@@ -151,7 +151,18 @@ def write_feather(df, dest, compression=None, 
compression_level=None,
             df = df.to_dense()
 
     if _pandas_api.is_data_frame(df):
-        table = Table.from_pandas(df, preserve_index=False)
+        """
+        Feather v1 creates a new column in the resultant Table to
+        store index information if index type is not RangeIndex
+        """
+        if version == 1 and type(df.index) is not _pandas_api.pd.RangeIndex:
+            preserve_index = False

Review Comment:
   ```suggestion
           if version == 1:
               preserve_index = False
   ```
   
   I would maybe keep the current behaviour for `preserve_index=False` for all 
cases when the user passes `version=1`. Since a user would only use this for 
legacy reasons anyway, it seems better to not change the behaviour here.



##########
python/pyarrow/feather.py:
##########
@@ -151,7 +151,18 @@ def write_feather(df, dest, compression=None, 
compression_level=None,
             df = df.to_dense()
 
     if _pandas_api.is_data_frame(df):
-        table = Table.from_pandas(df, preserve_index=False)
+        """
+        Feather v1 creates a new column in the resultant Table to
+        store index information if index type is not RangeIndex
+        """

Review Comment:
   ```suggestion
           # Feather v1 creates a new column in the resultant Table to
           # store index information if index type is not RangeIndex
   ```
   
   Inside a function, we typically use `# ` for comment strings, and only use 
the triple quote for the function docstring (directly after the function `def` 
line)



##########
python/pyarrow/tests/test_feather.py:
##########
@@ -820,3 +826,20 @@ def 
test_feather_v017_experimental_compression_backward_compatibility(datadir):
     expected = pa.table({'a': range(5)})
     result = read_table(datadir / "v0.17.0.version.2-compression.lz4.feather")
     assert result.equals(expected)
+
+
[email protected]
+def test_preserve_index_pandas(version):
+    data = {}
+    for i in range(4):
+        values = np.random.randint(0, 100, size=100)
+        data[i] = values
+
+    df = pd.DataFrame(data, index=data[0])

Review Comment:
   ```suggestion
       df = pd.DataFrame({'a': [1, 2, 3]}, index=['a', 'b', 'c'])
   ```
   
   A DataFrame like that should also cover the behaviour, and makes the test 
code a bit simpler.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #12821: ARROW-7914: [Python] Allow pandas datetime as index for feather

Reply via email to