westonpace commented on issue #37630:
URL: https://github.com/apache/arrow/issues/37630#issuecomment-1712000301

   Note that FileFragment in the datasets API caches the parquet metadata (with 
no option to disable this at the moment).  So if you are scanning many files 
you will see memory grow over the lifetime of the scan as more and more 
metadatas are cached.  I would expect a second scan would not grow the memory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to