wingkitlee0 commented on issue #39808:
URL: https://github.com/apache/arrow/issues/39808#issuecomment-2825922411

   Came across this issue recently and I can still see this 
https://github.com/apache/arrow/issues/39808#issuecomment-2163183635
   
   Previously I tried `pre_buffer=False` and `use_stream_buffer=True` in 
[`ParquetFragmentScanOptions` 
](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.ParquetFragmentScanOptions.html),
 in which the `total_allocated_bytes` stopped growing.
   
   There is also a new option `cache_metadata` in `to_batches` (not released 
yet; only dev version), which seems to reduce some %.
   
   However, the "memory usage" difference between dataset and ParquetFile is 
still quite big.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to