[ https://issues.apache.org/jira/browse/ARROW-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche updated ARROW-4359: ----------------------------------------- Description: Hi all, a while ago I posted this issue: {color:#333333}https://issues.apache.org/jira/browse/ARROW-3866{color} {color:#333333}While working with Pyarrow I encountered another potential bug related to column metadata: If I create a table containing columns with metadata everything is fine. But after I save the table to parquet and load it back as a table using pq.read_table, the column metadata is gone.{color} {color:#333333}As of now I can not say yet whether the metadata is not saved correctly or not loaded correctly, as I have no idea how to verify it. Unfortunately I also don't have the time try a lot, but I wanted to let you know anyway. {color} {code} field0 = pa.field('field1', pa.int64(), metadata=dict(a="A", b="B")) field1 = pa.field('field2', pa.int64(), nullable=False) columns = [ pa.column(field0, pa.array([1, 2])), pa.column(field1, pa.array([3, 4])) ] table = pa.Table.from_arrays(columns) pq.write_table(tab, path) tab2 = pq.read_table(path) tab2.column(0).field.metadata {code} was: Hi all, a while ago I posted this issue: {color:#333333}https://issues.apache.org/jira/browse/ARROW-3866{color} {color:#333333}While working with Pyarrow I encountered another potential bug related to column metadata: If I create a table containing columns with metadata everything is fine. But after I save the table to parquet and load it back as a table using pq.read_table, the column metadata is gone.{color} {color:#333333}As of now I can not say yet whether the metadata is not saved correctly or not loaded correctly, as I have no idea how to verify it. Unfortunately I also don't have the time try a lot, but I wanted to let you know anyway. The mentioned issue can be used as example, just add the following lines:{color} >>> pq.write_table(tab, path) >>> tab2 = pq.read_table(path) >>> tab2.column(0).field.metadata > [Python][Parquet] Column metadata is not saved or loaded in parquet > ------------------------------------------------------------------- > > Key: ARROW-4359 > URL: https://issues.apache.org/jira/browse/ARROW-4359 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Seb Fru > Priority: Major > Labels: parquet > > Hi all, > a while ago I posted this issue: > {color:#333333}https://issues.apache.org/jira/browse/ARROW-3866{color} > {color:#333333}While working with Pyarrow I encountered another potential bug > related to column metadata: If I create a table containing columns with > metadata everything is fine. But after I save the table to parquet and load > it back as a table using pq.read_table, the column metadata is gone.{color} > > {color:#333333}As of now I can not say yet whether the metadata is not saved > correctly or not loaded correctly, as I have no idea how to verify it. > Unfortunately I also don't have the time try a lot, but I wanted to let you > know anyway. > {color} > > {code} > field0 = pa.field('field1', pa.int64(), metadata=dict(a="A", b="B")) > field1 = pa.field('field2', pa.int64(), nullable=False) > columns = [ > pa.column(field0, pa.array([1, 2])), > pa.column(field1, pa.array([3, 4])) > ] > table = pa.Table.from_arrays(columns) > pq.write_table(tab, path) > tab2 = pq.read_table(path) > tab2.column(0).field.metadata > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)