[ 
https://issues.apache.org/jira/browse/TIKA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shunfei Chen updated TIKA-3243:
-------------------------------
    Description: 
We are using Tika library AutoDetectParser to extract metadata from a variety 
of files. We have been seeing some TikaException(stack trace below) in the past 
month since we upgraded to tika 1.24.1.
  
{code:java}
Caused by: org.apache.tika.exception.TikaException: data length must be < 
1000000: 17777730
 at 
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:233)
 at 
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:167)
 at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:135)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
  {code}
However, I think the PSD file we are parsing is a valid file. I can view it and 
can open it with photoshop. After some digging, I believe the changes was 
introduce as part of this jira https://issues.apache.org/jira/browse/TIKA-3050 
and this commit 
[https://github.com/apache/tika/commit/ab8a9ed830ec710a32e4ffdf4989aea3aaea92ef(line:]
 232).
  
 The biggest size we have seen in from the files our users uploaded is 
161,548,458 so far, which is way above 1000,000 in PSDParser

Please let me know if you need any extra informations. 
  
 Thanks
 Shunfei. 

  was:
We are using Tika library AutoDetectParser to extract metadata from a variety 
of files. We have been seeing some TikaException(stack trace below) in the past 
month since we upgraded to tika 1.24.1.
  
{code:java}
Caused by: org.apache.tika.exception.TikaException: data length must be < 
1000000: 17777730
 at 
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:233)
 at 
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:167)
 at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:135)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
  {code}
However, I think the PSD file we are parsing is a valid file. I can view it and 
can open it with photoshop. After some digging, I believe the changes was 
introduce as part of this jira https://issues.apache.org/jira/browse/TIKA-3050 
and this commit 
[https://github.com/apache/tika/commit/ab8a9ed830ec710a32e4ffdf4989aea3aaea92ef(line:]
 232).
  
 The biggest size we have seen in from the files our users uploaded is 
161,548,458 so far, which is way above 1000,000 in PSDParser
  
 Thanks
 Shunfei. 


> PSDParser MAX_DATA_LENGTH_BYTES check causes TikaException
> ----------------------------------------------------------
>
>                 Key: TIKA-3243
>                 URL: https://issues.apache.org/jira/browse/TIKA-3243
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Shunfei Chen
>            Priority: Major
>
> We are using Tika library AutoDetectParser to extract metadata from a variety 
> of files. We have been seeing some TikaException(stack trace below) in the 
> past month since we upgraded to tika 1.24.1.
>   
> {code:java}
> Caused by: org.apache.tika.exception.TikaException: data length must be < 
> 1000000: 17777730
>  at 
> org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:233)
>  at 
> org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:167)
>  at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:135)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
>   {code}
> However, I think the PSD file we are parsing is a valid file. I can view it 
> and can open it with photoshop. After some digging, I believe the changes was 
> introduce as part of this jira 
> https://issues.apache.org/jira/browse/TIKA-3050 and this commit 
> [https://github.com/apache/tika/commit/ab8a9ed830ec710a32e4ffdf4989aea3aaea92ef(line:]
>  232).
>   
>  The biggest size we have seen in from the files our users uploaded is 
> 161,548,458 so far, which is way above 1000,000 in PSDParser
> Please let me know if you need any extra informations. 
>   
>  Thanks
>  Shunfei. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to