Re: [PR] GH-36412: [Python][CI] Fix extra deprecation warnings in the pandas nightly build [arrow]

via GitHub Tue, 16 Jan 2024 00:13:44 -0800


jorisvandenbossche commented on code in PR #39609:
URL: https://github.com/apache/arrow/pull/39609#discussion_r1453060849



##########
python/pyarrow/pandas_compat.py:
##########
@@ -950,7 +950,7 @@ def _reconstruct_index(table, index_descriptors, 
all_columns, types_mapper=None)
         index = index_arrays[0]
         if not isinstance(index, pd.Index):
             # Box anything that wasn't boxed above
-            index = pd.Index(index, name=index_names[0])
+            index = pd.Index(index.infer_objects(), name=index_names[0])

Review Comment:
   Do you know for which test this is needed? (I mean, which test triggers the 
warning here) 
   I am wondering what `index` would be then, because I would think we always 
already created a proper pandas object.



##########
python/pyarrow/tests/test_pandas.py:
##########
@@ -4312,10 +4314,13 @@ def test_array_to_pandas():
 def test_roundtrip_empty_table_with_extension_dtype_index():
     df = pd.DataFrame(index=pd.interval_range(start=0, end=3))
     table = pa.table(df)
-    table.to_pandas().index == pd.Index([{'left': 0, 'right': 1},
-                                         {'left': 1, 'right': 2},
-                                         {'left': 2, 'right': 3}],
-                                        dtype='object')
+    if Version(pd.__version__) > Version("1.0"):
+        tm.assert_index_equal(table.to_pandas().index, df.index)
+    else:
+        assert table.to_pandas().index == pd.Index([{'left': 0, 'right': 1},

Review Comment:
   I don't think we still support older pandas versions? 
(https://github.com/apache/arrow/pull/14631) So then this can be simplified



##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -178,12 +178,14 @@ def multisourcefs(request):
 
     # simply split the dataframe into four chunks to construct a data source
     # from each chunk into its own directory
-    df_a, df_b, df_c, df_d = np.array_split(df, 4)
+    n = len(df)
+    df_a, df_b, df_c, df_d = [df[i:i+n//4] for i in range(0, n, n//4)]
 
     # create a directory containing a flat sequence of parquet files without
     # any partitioning involved
     mockfs.create_dir('plain')
-    for i, chunk in enumerate(np.array_split(df_a, 10)):
+    n = len(df_a)
+    for i, chunk in enumerate([df_a[i:i+n//10] for i in range(0, n, n//10)]):

Review Comment:
   ```suggestion
       for i, chunk in enumerate([df_a.iloc[i:i+n//10] for i in range(0, n, 
n//10)]):
   ```



##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -178,12 +178,14 @@ def multisourcefs(request):
 
     # simply split the dataframe into four chunks to construct a data source
     # from each chunk into its own directory
-    df_a, df_b, df_c, df_d = np.array_split(df, 4)
+    n = len(df)
+    df_a, df_b, df_c, df_d = [df[i:i+n//4] for i in range(0, n, n//4)]

Review Comment:
   ```suggestion
       df_a, df_b, df_c, df_d = [df.iloc[i:i+n//4] for i in range(0, n, n//4)]
   ```
   
   (should be exactly the same, but is a bit more explicit that it is doing 
positional indexing)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-36412: [Python][CI] Fix extra deprecation warnings in the pandas nightly build [arrow]

Reply via email to