Tim Barrett created TIKA-4427: --------------------------------- Summary: Memory Leak when parsing a large (110K+) number of documents Key: TIKA-4427 URL: https://issues.apache.org/jira/browse/TIKA-4427 Project: Tika Issue Type: Bug Components: core Affects Versions: 3.2.0 Reporter: Tim Barrett Attachments: Screenshot 2025-05-30 at 17.22.38.png, Screenshot 2025-05-30 at 18.31.01.png, Screenshot 2025-05-30 at 18.31.47.png
When parsing a very large number of documents, which include a lot of eml files we see that The static field XMLReaderUtils.SAX_PARSERS is holding a massive amount of memory: 3.28 GB. This is a static pool of cached SAXParser instances, each of which is holding onto substantial amounts of memory, apparently in the fDocumentHandler field. This is a big data test we run regularly, the memory issues did not occur in Tika version 2.x I have attached JVM monitor screenshots. -- This message was sent by Atlassian Jira (v8.20.10#820010)