[
https://issues.apache.org/jira/browse/TIKA-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jan Burkhardt updated TIKA-2447:
--------------------------------
Description:
PSD files (Adobe Photoshop) are split into ResourceBlock's which contain
different data, but only Caption Blocks are currently extracted into the
description.
Parsing a file with very big blocks, i.e. for image data, a byte array of the
size of the block is allocated:
https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191
even if it is discarded after that:
https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L116
and following lines
This causes huge memory consumption and finally killed the App with an
OutOfMemoryError.
{noformat}
java.lang.OutOfMemoryError: Java heap space
at
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:191)
~[tika-parsers-1.15.jar!/:1.15]
at
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:141)
~[tika-parsers-1.15.jar!/:1.15]
at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:116)
~[tika-parsers-1.15.jar!/:1.15]
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
~[tika-core-1.15.jar!/:1.15]
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
~[tika-core-1.15.jar!/:1.15]
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
~[tika-core-1.15.jar!/:1.15]
{noformat}
I am not able to deliver a file to reproduce that, since the file which caused
that issue is owned by one of our customers.
I will prepare a pull request to fix that.
was:
PSD files (Adobe Photoshop) are split into ResourceBlock's which contain
different data, but only Caption Blocks are currently extracted into the
description.
Parsing a file with very big blocks, i.e. for image data, a byte array of the
size of the block is allocated:
https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191
even if it is discarded after that:
https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L116
and following lines
This causes huge memory consumption and finally killed the App with an
OutOfMemoryError.
{noformat}
java.lang.OutOfMemoryError: Java heap space
at
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:191)
~[tika-parsers-1.15.jar!/:1.15]
at
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:141)
~[tika-parsers-1.15.jar!/:1.15]
at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:116)
~[tika-parsers-1.15.jar!/:1.15]
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
~[tika-core-1.15.jar!/:1.15]
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
~[tika-core-1.15.jar!/:1.15]
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
~[tika-core-1.15.jar!/:1.15]
{noformat}
I am not able to deliver a file to reproduce that, since the file which caused
that issue is owned by one of our customers.
I will prepare a pull request to fix that.
> PSDParser creates unnecessary large byte array and discards it
> --------------------------------------------------------------
>
> Key: TIKA-2447
> URL: https://issues.apache.org/jira/browse/TIKA-2447
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.15, 1.16
> Environment: openjdk version "1.8.0_131"
> few memory (currently using 256M xmx)
> Reporter: Jan Burkhardt
> Priority: Critical
>
> PSD files (Adobe Photoshop) are split into ResourceBlock's which contain
> different data, but only Caption Blocks are currently extracted into the
> description.
> Parsing a file with very big blocks, i.e. for image data, a byte array of the
> size of the block is allocated:
> https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191
> even if it is discarded after that:
> https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L116
> and following lines
> This causes huge memory consumption and finally killed the App with an
> OutOfMemoryError.
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:191)
> ~[tika-parsers-1.15.jar!/:1.15]
> at
> org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:141)
> ~[tika-parsers-1.15.jar!/:1.15]
> at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:116)
> ~[tika-parsers-1.15.jar!/:1.15]
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[tika-core-1.15.jar!/:1.15]
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[tika-core-1.15.jar!/:1.15]
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> ~[tika-core-1.15.jar!/:1.15]
> {noformat}
> I am not able to deliver a file to reproduce that, since the file which
> caused that issue is owned by one of our customers.
> I will prepare a pull request to fix that.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)