icexelloss commented on issue #37630: URL: https://github.com/apache/arrow/issues/37630#issuecomment-1712025049
> Note that FileFragment in the datasets API caches the parquet metadata (with no option to disable this at the moment). So if you are scanning many files you will see memory grow over the lifetime of the scan as more and more metadatas are cached. I would expect a second scan would not grow the memory. Thanks @westonpace, can you give a pointer to where that is happening? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
