Thomas Buhrmann created ARROW-2573: -------------------------------------- Summary: Field metadata is lost on serialization round-trip Key: ARROW-2573 URL: https://issues.apache.org/jira/browse/ARROW-2573 Project: Apache Arrow Issue Type: Bug Reporter: Thomas Buhrmann
It seems only schema metadata roundtrips, while field metadata is lost: {code:java} import pandas as pd import pyarrow as pa fnm = "/path/to/file.arr" df = pd.DataFrame({"x": [0,1,2,3]}) tbl = pa.Table.from_pandas(df) metadata = {"custom": "test"} # Update field metadata, and schema metadata fields = [col.field.add_metadata(metadata) for col in tbl.itercolumns()] schema_metadata = {**tbl.schema.metadata, **metadata} schema = pa.schema(fields, metadata=schema_metadata) tbl = pa.Table.from_batches(tbl.to_batches(), schema=schema) print(tbl.column(0).field.metadata) # correct :) print(tbl.schema.field_by_name("x").metadata) # correct :) print(tbl.schema) # correct :) # Roundtrip writer = pa.RecordBatchStreamWriter(fnm, tbl.schema) writer.write_table(tbl) writer.close() reader = pa.RecordBatchStreamReader(fnm) tbl = reader.read_all() # Check print(tbl.column(0).field.metadata) # None :( print(tbl.schema.field_by_name("x").metadata) # None :( print(tbl.schema) # Metadata good :) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)