[
https://issues.apache.org/jira/browse/TIKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974533#comment-15974533
]
Tim Allison edited comment on TIKA-1631 at 4/19/17 12:28 PM:
-------------------------------------------------------------
[~lfcnassif], I'd like to add a temporary hack for OOM protections for both
detection and parsing for LZMA (COMPRESS-382) and Z (COMPRESS-386) before we
release Tika 1.15.
I haven't looked carefully yet, but are you aware of other compress formats
that can cause a preventable OOM on initialization?
was (Author: [email protected]):
[~lfcnassif], I'd like to add a temporary hack for OOM protections for both
detection and parsing for LZMA (COMPRESS-382) and Z (COMPRESS-386) before we
release Tika 1.15.
I haven't looked carefully yet, but are you aware of other formats that can
cause a preventable OOM on initialization?
> OutOfMemoryException in ZipContainerDetector
> --------------------------------------------
>
> Key: TIKA-1631
> URL: https://issues.apache.org/jira/browse/TIKA-1631
> Project: Tika
> Issue Type: Bug
> Components: detector
> Affects Versions: 1.8
> Reporter: Pavel Micka
> Attachments: cache.mpgindex
>
>
> When I try to detect ZIP container I rarely get this exception. It is caused
> by the fact that the file looks like ZIP container (magics), but in fact its
> random noise. So Apache decompress tries to find the size of tables (expects
> correct stream), loads coincidentally huge number (as on the given place
> there can be anything in the stream) and tries to allocate array of several
> GB in size (hence the exception).
> This bug negatively influences stability of systems running Tika, as the
> decompressor can accidentally allocate as much memory as is available and
> other parts of the system then might not be able to allocate their objects.
> A solution might be to add additional parameter to Tika config that would
> limit size of these arrays. If the size would be bigger, it would throw
> exception. This change should not be hard, as method
> InternalLZWInputStream.initializeTables() is protected.
> Exception in thread "pool-2-thread-2" java.lang.OutOfMemoryError: Java heap
> space
> at
> org.apache.commons.compress.compressors.z._internal_.InternalLZWInputStream.initializeTables(InternalLZWInputStream.java:111)
> at
> org.apache.commons.compress.compressors.z.ZCompressorInputStream.<init>(ZCompressorInputStream.java:52)
> at
> org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:186)
> at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:106)
> at
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:92)
> at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)