icexelloss commented on issue #37630:
URL: https://github.com/apache/arrow/issues/37630#issuecomment-1712025049

   > Note that FileFragment in the datasets API caches the parquet metadata 
(with no option to disable this at the moment). So if you are scanning many 
files you will see memory grow over the lifetime of the scan as more and more 
metadatas are cached. I would expect a second scan would not grow the memory.
   
   Thanks @westonpace, can you give a pointer to where that is happening?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to