Yue Ni created ARROW-16131:
------------------------------

             Summary: record batch specific metadata is not saved in IPC file
                 Key: ARROW-16131
                 URL: https://issues.apache.org/jira/browse/ARROW-16131
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
    Affects Versions: 7.0.0
            Reporter: Yue Ni


When writing an IPC file having multiple record batches, the schema provided to 
`IpcFormatWriter` is correctly written to IPC file's footer, however, if the 
record batch written has its batch specific metadata associated with it, this 
metadata is not written.

This can be reproduced with the following test case (using pyarrow):

```python

def test_chunked_record_batch_meta():
    num_batches = 2
    ipc_file = "/tmp/batches_with_metadata.arrow"

    int_array = pa.array([i for i in range(chunk_size)])
    schema = pa.schema(
        [
            ("values", pa.int64()),
        ],
        metadata=\{"foo": "bar" },
    )

    writer = pa.RecordBatchFileWriter(
        ipc_file, schema
    )

    for i in range(num_batches):
        # follow examples here:
        # 
https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_table.py
        batch = pa.record_batch(
            [int_array],
            names=["values"],
            metadata=\{"batch_id": str(i)},
        )
        writer.write_batch(batch)

    writer.close()

    mmapped_file = pa.memory_map(ipc_file)
    reader = pa.ipc.open_file(mmapped_file)
    batch_0 = reader.get_record_batch(0)
    assert batch_0.schema.metadata

```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to