https://issues.apache.org/bugzilla/show_bug.cgi?id=52400

             Bug #: 52400
           Summary: TNEF parsing unstable
           Product: POI
           Version: unspecified
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: critical
          Priority: P2
         Component: POI Overall
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified


> We are seeing problems in Solr with tika throwing exceptions. Sometimes we 
> see OOM like this:
> {noformat}
> SEVERE: java.lang.OutOfMemoryError: Java heap space
>        at 
> org.apache.poi.hmef.attribute.TNEFAttribute.<init>(TNEFAttribute.java:50)
>        at 
> org.apache.poi.hmef.attribute.TNEFAttribute.create(TNEFAttribute.java:76)
>        at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:74)
>        at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>        at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>        at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>        at org.apache.poi.hmef.HMEFMessage.<init>(HMEFMessage.java:63)
>        at 
> org.apache.tika.parser.microsoft.TNEFParser.parse(TNEFParser.java:79)
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>        at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129)
>        at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:195)
>        at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1478)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>        at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>        at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>        at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>        at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>        at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>        at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>        at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>        at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>        at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> {noformat}
> Other times, we see errors like this one:
> {noformat}
> Caused by: org.apache.poi.util.LittleEndian$BufferUnderrunException: buffer 
> underrun
>        at org.apache.poi.util.LittleEndian.readUShort(LittleEndian.java:302)
>        at 
> org.apache.poi.hmef.attribute.TNEFAttribute.<init>(TNEFAttribute.java:53)
>        at 
> org.apache.poi.hmef.attribute.TNEFAttribute.create(TNEFAttribute.java:76)
>        at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:74)
>        at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>        at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>        at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>        at org.apache.poi.hmef.HMEFMessage.<init>(HMEFMessage.java:63)
>        at 
> org.apache.tika.parser.microsoft.TNEFParser.parse(TNEFParser.java:79)
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>        ... 26 more
> {noformat}

We are currently evaluating Solr and Tika using valid sampele data. Solr and
Tika projects have both indicated this problem should be reported and fixed in
POI.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to