Pavel Micka created TIKA-1631:
---------------------------------
Summary: OutOfMemoryException in ZipContainerDetector
Key: TIKA-1631
URL: https://issues.apache.org/jira/browse/TIKA-1631
Project: Tika
Issue Type: Bug
Components: detector
Affects Versions: 1.8
Reporter: Pavel Micka
When I try to detect ZIP container I rarely get this exception. It is caused by
the fact that the file looks like ZIP container (magics), but in fact its
random noise. So Apache decompress tries to find the size of tables (expects
correct stream), loads coincidentally huge number (as on the given place there
can be anything in the stream) and tries to allocate array of several GB in
size (hence the exception).
A solution might be to add additional parameter to Tika config that would limit
size of these arrays. If the size would be bigger, it would throw exception.
This change should not be hard, as method
InternalLZWInputStream.initializeTables() is protected.
Exception in thread "pool-2-thread-2" java.lang.OutOfMemoryError: Java heap
space
at
org.apache.commons.compress.compressors.z._internal_.InternalLZWInputStream.initializeTables(InternalLZWInputStream.java:111)
at
org.apache.commons.compress.compressors.z.ZCompressorInputStream.<init>(ZCompressorInputStream.java:52)
at
org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:186)
at
org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:106)
at
org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:92)
at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)