rok commented on code in PR #47147:
URL: https://github.com/apache/arrow/pull/47147#discussion_r2263984852


##########
python/pyarrow/pandas_compat.py:
##########
@@ -275,11 +275,14 @@ def construct_metadata(columns_to_convert, df, 
column_names, index_levels,
     else:
         index_descriptors = index_column_metadata = column_indexes = []
 
+    attributes: dict = df.attrs if df.attrs else {}

Review Comment:
   Can attrs actually be None? The [NDframe 
constructor](https://github.com/pandas-dev/pandas/blame/c888af6d0bb674932007623c0867e1fbd4bdc2c6/pandas/core/generic.py#L282)
 I fround seems to say it'll always be a dict, maybe with values. If this is 
indeed true (I might be wrong) we could just have:
   
   ```suggestion
       attributes = df.attrs
   ```
   
   If older versions of pandas don't have `attrs` would be safer:
   ```python
       attributes = df.attrs if hasattr(df, "attrs") else {}
   ```
   Note: type annotations are coming, but we're not sure they'll be inline or 
not. So I'd suggest not adding it here.



##########
python/pyarrow/tests/parquet/test_pandas.py:
##########
@@ -101,6 +101,33 @@ def 
test_merging_parquet_tables_with_different_pandas_metadata(tempdir):
     writer.write_table(table2)
 
 
+@pytest.mark.pandas
+def test_attributes_metadata_persistence(tempdir):
+    # GH-45382: Add support for pandas DataFrame.attrs
+    # During the .parquet file writing, the attrs are serialised into json
+    # along with the rest of the pandas.DataFrame metadata.
+
+    filename = tempdir / "metadata_persistence.parquet"
+    df = alltypes_sample(size=10000)
+    df.attrs = {
+        'float16': 'half-precision',
+        'float32': 'single precision',
+        'float64': 'double precision',
+        'desciption': 'Attributes Persistence Test DataFrame',
+    }
+
+    table = pa.Table.from_pandas(df)
+    assert b'attributes' in table.schema.metadata[b'pandas']
+
+    _write_table(table, filename)
+    metadata = pq.read_metadata(filename).metadata
+    assert b'attributes' in table.schema.metadata[b'pandas']

Review Comment:
   Since we're not chnanging schemas metadata (as far as I can tell), one of 
these asserts is redundant:
   ```python
   assert b'attributes' in table.schema.metadata[b'pandas']
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to