Pavel Micka created TIKA-1631:
---------------------------------

             Summary: OutOfMemoryException in ZipContainerDetector
                 Key: TIKA-1631
                 URL: https://issues.apache.org/jira/browse/TIKA-1631
             Project: Tika
          Issue Type: Bug
          Components: detector
    Affects Versions: 1.8
            Reporter: Pavel Micka


When I try to detect ZIP container I rarely get this exception. It is caused by 
the fact that the file looks like ZIP container (magics), but in fact its 
random noise. So Apache decompress tries to find the size of tables (expects 
correct stream), loads coincidentally huge number (as on the given place there 
can be anything in the stream) and tries to allocate array of several GB in 
size (hence the exception).

A solution might be to add additional parameter to Tika config that would limit 
size of these arrays. If the size would be bigger, it would throw exception. 
This change should not be hard, as method 
InternalLZWInputStream.initializeTables() is protected.  

Exception in thread "pool-2-thread-2" java.lang.OutOfMemoryError: Java heap 
space
        at 
org.apache.commons.compress.compressors.z._internal_.InternalLZWInputStream.initializeTables(InternalLZWInputStream.java:111)
        at 
org.apache.commons.compress.compressors.z.ZCompressorInputStream.<init>(ZCompressorInputStream.java:52)
        at 
org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:186)
        at 
org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:106)
        at 
org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:92)
        at 
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to