jorisvandenbossche commented on issue #38212:
URL: https://github.com/apache/arrow/issues/38212#issuecomment-1759121919

   Thanks for the report! Do you see the issue as well with a reproducible 
example?  
   I tried something very simple creating a generator of batches to write, and 
then I don't see any memory issue:
   
   ```python
   import pyarrow as pa
   import numpy as np
   import pandas as pd
   
   
   def generate_random_data():
       for _ in range(1000):
           yield pa.record_batch([np.random.randn(60000) for _ in range(5)], 
['a', 'b', 'c', 'd', 'e'])
   
   schema = pa.schema([(name, 'float64') for name in ['a', 'b', 'c', 'd', 'e']])
   record_batch_reader = pa.RecordBatchReader.from_batches(schema, 
generate_random_data())
   
   
   with pa.OSFile("/tmp/outfile", mode="wb") as f:
       record_batch_writer = pa.ipc.RecordBatchFileWriter(f, schema=schema)
   
       for batch in record_batch_reader:   
           record_batch_writer.write_batch(batch)
   
       record_batch_writer.close()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to