Yue Ni created ARROW-16131:
------------------------------
Summary: record batch specific metadata is not saved in IPC file
Key: ARROW-16131
URL: https://issues.apache.org/jira/browse/ARROW-16131
Project: Apache Arrow
Issue Type: Bug
Components: C++
Affects Versions: 7.0.0
Reporter: Yue Ni
When writing an IPC file having multiple record batches, the schema provided to
`IpcFormatWriter` is correctly written to IPC file's footer, however, if the
record batch written has its batch specific metadata associated with it, this
metadata is not written.
This can be reproduced with the following test case (using pyarrow):
```python
def test_chunked_record_batch_meta():
num_batches = 2
ipc_file = "/tmp/batches_with_metadata.arrow"
int_array = pa.array([i for i in range(chunk_size)])
schema = pa.schema(
[
("values", pa.int64()),
],
metadata=\{"foo": "bar" },
)
writer = pa.RecordBatchFileWriter(
ipc_file, schema
)
for i in range(num_batches):
# follow examples here:
#
https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_table.py
batch = pa.record_batch(
[int_array],
names=["values"],
metadata=\{"batch_id": str(i)},
)
writer.write_batch(batch)
writer.close()
mmapped_file = pa.memory_map(ipc_file)
reader = pa.ipc.open_file(mmapped_file)
batch_0 = reader.get_record_batch(0)
assert batch_0.schema.metadata
```
--
This message was sent by Atlassian Jira
(v8.20.1#820001)