rok commented on code in PR #48979:
URL: https://github.com/apache/arrow/pull/48979#discussion_r2828567128


##########
python/pyarrow/tests/parquet/test_basic.py:
##########
@@ -759,17 +760,23 @@ def test_fastparquet_cross_compatibility(tempdir):
 
     fp_file = fp.ParquetFile(file_arrow)
     df_fp = fp_file.to_pandas()
-    tm.assert_frame_equal(df, df_fp)
+    # TODO: once fastparquet supports pandas 3 dtypes revert string and 
categorical
+    # tests by removing `check_dtype=False` and `check_categorical=False` so 
type
+    # equality is asserted again
+    tm.assert_frame_equal(df, df_fp, check_dtype=False, 
check_categorical=False)
 
     # Fastparquet -> arrow
     file_fastparquet = str(tempdir / "cross_compat_fastparquet.parquet")
-    fp.write(file_fastparquet, df)
+    # fastparquet can't write pandas 3.0 StringDtype
+    df_for_fp = df.copy()
+    df_for_fp['a'] = df_for_fp['a'].astype(object)
+    fp.write(file_fastparquet, df_for_fp)
 
     table_fp = pq.read_pandas(file_fastparquet)
     # for fastparquet written file, categoricals comes back as strings
     # (no arrow schema in parquet metadata)
-    df['f'] = df['f'].astype(object)
-    tm.assert_frame_equal(table_fp.to_pandas(), df)
+    tm.assert_frame_equal(table_fp.to_pandas(), df_for_fp, check_dtype=False,
+                          check_categorical=False)

Review Comment:
   Maybe we can reuse `expected_types`?
   ```suggestion
       tm.assert_frame_equal(table_fp.to_pandas(), df.astype(expected_types))
   ```



##########
python/pyarrow/tests/parquet/test_basic.py:
##########
@@ -759,17 +760,23 @@ def test_fastparquet_cross_compatibility(tempdir):
 
     fp_file = fp.ParquetFile(file_arrow)
     df_fp = fp_file.to_pandas()
-    tm.assert_frame_equal(df, df_fp)
+    # TODO: once fastparquet supports pandas 3 dtypes revert string and 
categorical
+    # tests by removing `check_dtype=False` and `check_categorical=False` so 
type
+    # equality is asserted again
+    tm.assert_frame_equal(df, df_fp, check_dtype=False, 
check_categorical=False)

Review Comment:
   Would this work instead? Since underlying type of categorical is not `int` 
it feels like it should?
   ```suggestion
       # TODO: once fastparquet supports pandas 3's pd.StringDtype() remove 
casting
       expected_types = {"a": pd.StringDtype()}
       tm.assert_frame_equal(df, df_fp.astype(expected_types))
   ```



##########
python/pyarrow/tests/parquet/test_basic.py:
##########
@@ -759,17 +760,23 @@ def test_fastparquet_cross_compatibility(tempdir):
 
     fp_file = fp.ParquetFile(file_arrow)
     df_fp = fp_file.to_pandas()
-    tm.assert_frame_equal(df, df_fp)
+    # TODO: once fastparquet supports pandas 3 dtypes revert string and 
categorical
+    # tests by removing `check_dtype=False` and `check_categorical=False` so 
type
+    # equality is asserted again
+    tm.assert_frame_equal(df, df_fp, check_dtype=False, 
check_categorical=False)
 
     # Fastparquet -> arrow
     file_fastparquet = str(tempdir / "cross_compat_fastparquet.parquet")
-    fp.write(file_fastparquet, df)
+    # fastparquet can't write pandas 3.0 StringDtype
+    df_for_fp = df.copy()
+    df_for_fp['a'] = df_for_fp['a'].astype(object)
+    fp.write(file_fastparquet, df_for_fp)

Review Comment:
   Perhaps we can avoid creating a new dataframe and only non-destructively 
cast at write-time?
   ```suggestion
       # fastparquet can't write pandas 3.0 StringDtype
       fp_compatible_types =  {"a": object}
       fp.write(file_fastparquet, df.astype(fp_compatible_types))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to