assignUser commented on issue #36019:
URL: https://github.com/apache/arrow/issues/36019#issuecomment-1585444814

   The issue is simply that del+gc is not instant. If you add delay you can 
easily see that:
   ```python
       sleep(1)
       show_memory_info('after 1s:')
   
       sleep(1)
       show_memory_info('after 2s:')
       
       sleep(1)
       show_memory_info('after 3s:')
   ```
   ```
   after 1s: -- current(MB): 352.238
   after 1s: -- total(MB): 31827.660
   after 1s: -- account(MB): 26.000
   
   after 2s: -- current(MB): 131.945
   after 2s: -- total(MB): 31827.660
   after 2s: -- account(MB): 24.800
   
   after 3s: -- current(MB): 68.965
   after 3s: -- total(MB): 31827.660
   after 3s: -- account(MB): 24.600
   ```
   
   Also if you have to process data across a bunch of flies maybe the dataset 
api could useful for you: https://arrow.apache.org/docs/python/dataset.html
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to