Tim Barrett created TIKA-4427:
---------------------------------

             Summary: Memory Leak when parsing a large (110K+)  number of 
documents 
                 Key: TIKA-4427
                 URL: https://issues.apache.org/jira/browse/TIKA-4427
             Project: Tika
          Issue Type: Bug
          Components: core
    Affects Versions: 3.2.0
            Reporter: Tim Barrett
         Attachments: Screenshot 2025-05-30 at 17.22.38.png, Screenshot 
2025-05-30 at 18.31.01.png, Screenshot 2025-05-30 at 18.31.47.png

When parsing a very large number of documents, which include a lot of eml files 
we see that  

The static field XMLReaderUtils.SAX_PARSERS  is holding a massive amount of 
memory: 3.28 GB. This is a static pool of cached SAXParser instances, each of 
which is holding onto substantial amounts of memory, apparently in the 
fDocumentHandler field.

This is a big data test we run regularly, the memory issues did not occur in 
Tika version 2.x

 

I have attached JVM monitor screenshots.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to