jorisvandenbossche commented on issue #38212: URL: https://github.com/apache/arrow/issues/38212#issuecomment-1759121919
Thanks for the report! Do you see the issue as well with a reproducible example? I tried something very simple creating a generator of batches to write, and then I don't see any memory issue: ```python import pyarrow as pa import numpy as np import pandas as pd def generate_random_data(): for _ in range(1000): yield pa.record_batch([np.random.randn(60000) for _ in range(5)], ['a', 'b', 'c', 'd', 'e']) schema = pa.schema([(name, 'float64') for name in ['a', 'b', 'c', 'd', 'e']]) record_batch_reader = pa.RecordBatchReader.from_batches(schema, generate_random_data()) with pa.OSFile("/tmp/outfile", mode="wb") as f: record_batch_writer = pa.ipc.RecordBatchFileWriter(f, schema=schema) for batch in record_batch_reader: record_batch_writer.write_batch(batch) record_batch_writer.close() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org