jorisvandenbossche commented on code in PR #39609:
URL: https://github.com/apache/arrow/pull/39609#discussion_r1453060849
##########
python/pyarrow/pandas_compat.py:
##########
@@ -950,7 +950,7 @@ def _reconstruct_index(table, index_descriptors,
all_columns, types_mapper=None)
index = index_arrays[0]
if not isinstance(index, pd.Index):
# Box anything that wasn't boxed above
- index = pd.Index(index, name=index_names[0])
+ index = pd.Index(index.infer_objects(), name=index_names[0])
Review Comment:
Do you know for which test this is needed? (I mean, which test triggers the
warning here)
I am wondering what `index` would be then, because I would think we always
already created a proper pandas object.
##########
python/pyarrow/tests/test_pandas.py:
##########
@@ -4312,10 +4314,13 @@ def test_array_to_pandas():
def test_roundtrip_empty_table_with_extension_dtype_index():
df = pd.DataFrame(index=pd.interval_range(start=0, end=3))
table = pa.table(df)
- table.to_pandas().index == pd.Index([{'left': 0, 'right': 1},
- {'left': 1, 'right': 2},
- {'left': 2, 'right': 3}],
- dtype='object')
+ if Version(pd.__version__) > Version("1.0"):
+ tm.assert_index_equal(table.to_pandas().index, df.index)
+ else:
+ assert table.to_pandas().index == pd.Index([{'left': 0, 'right': 1},
Review Comment:
I don't think we still support older pandas versions?
(https://github.com/apache/arrow/pull/14631) So then this can be simplified
##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -178,12 +178,14 @@ def multisourcefs(request):
# simply split the dataframe into four chunks to construct a data source
# from each chunk into its own directory
- df_a, df_b, df_c, df_d = np.array_split(df, 4)
+ n = len(df)
+ df_a, df_b, df_c, df_d = [df[i:i+n//4] for i in range(0, n, n//4)]
# create a directory containing a flat sequence of parquet files without
# any partitioning involved
mockfs.create_dir('plain')
- for i, chunk in enumerate(np.array_split(df_a, 10)):
+ n = len(df_a)
+ for i, chunk in enumerate([df_a[i:i+n//10] for i in range(0, n, n//10)]):
Review Comment:
```suggestion
for i, chunk in enumerate([df_a.iloc[i:i+n//10] for i in range(0, n,
n//10)]):
```
##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -178,12 +178,14 @@ def multisourcefs(request):
# simply split the dataframe into four chunks to construct a data source
# from each chunk into its own directory
- df_a, df_b, df_c, df_d = np.array_split(df, 4)
+ n = len(df)
+ df_a, df_b, df_c, df_d = [df[i:i+n//4] for i in range(0, n, n//4)]
Review Comment:
```suggestion
df_a, df_b, df_c, df_d = [df.iloc[i:i+n//4] for i in range(0, n, n//4)]
```
(should be exactly the same, but is a bit more explicit that it is doing
positional indexing)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]