jorisvandenbossche commented on a change in pull request #6959:
URL: https://github.com/apache/arrow/pull/6959#discussion_r413750283



##########
File path: python/pyarrow/tests/test_extension_type.py
##########
@@ -445,22 +445,28 @@ def test_parquet(tmpdir, registered_period_type):
     import base64
     decoded_schema = base64.b64decode(meta.metadata[b"ARROW:schema"])
     schema = pa.ipc.read_schema(pa.BufferReader(decoded_schema))
-    assert schema.field("ext").metadata == {
-        b'ARROW:extension:metadata': b'freq=D',
-        b'ARROW:extension:name': b'pandas.period'}
+    # Since the type could be reconstructed, the extension type metadata is
+    # absent.
+    assert schema.field("ext").metadata == {}

Review comment:
       I don't have a fully good sense of what people do with this metadata, 
but I suppose when the type was recognized, it should not be a problem if the 
metadata is not present anymore (since you can always retrieve it again from 
the type instance).
   
   I know Micah mentioned they were using those metadata in BigQuery, but 
without a registered extension type, so such use case should not be impacted.
   
   This test is for parquet roundtrip, but a similar change was done for plain 
IPC roundtrip as well? (that currently also preserves the field metadata, even 
when the type was recognized)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to