js8544 commented on issue #37008:
URL: https://github.com/apache/arrow/issues/37008#issuecomment-1670946741

   Actually, I was wrong. `Table.slice` will drop unused chunks, so it should 
be the optimal solution.
   ```
   import pyarrow as pa
   import pyarrow.compute as pc
   
   max_table_size = 4
   
   table1 = pa.table([pa.array([1,2,3,4]), pa.array([11,12,13,14])], 
names=['a', 'b'])
   table2 = pa.table([pa.array([5,6]), pa.array([15,16])], names=['a', 'b'])
   table3 = pa.table([pa.array([7,8,9]), pa.array([17,18,19])], names=['a', 
'b'])
   
   table = pa.concat_tables([table1, table2, table3])
   print("Num of chunks before slice: ", table.column(0).num_chunks)
   print("Total buffer size before slice: ", table.get_total_buffer_size())
   
   table = table.slice(table.num_rows - max_table_size, max_table_size)
   print("Num of chunks after slice:  ", table.column(0).num_chunks)
   print("Total buffer size after slice: ", table.get_total_buffer_size())
   ```
   Outputs:
   ```
   Num of chunks before slice:  3
   Total buffer size before slice:  144
   Num of chunks after slice:   2
   Total buffer size after slice:  80
   ```
   The first table is dropped from the resulting table and its memory is 
correctly recycled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to