[ 
https://issues.apache.org/jira/browse/CONNECTORS-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318743#comment-16318743
 ] 

Karl Wright commented on CONNECTORS-1481:
-----------------------------------------

So...  It appears that the issue might be a mismatch between the version of POI 
we included in 2.9 (1.17), and the version of Tika that we shipped (1.16).  We 
could not ship the version of POI that was compatible with 1.16 because that 
had a major security issue with XML XSS injection.  We could technically have 
gone with Tika 1.17, though, since it was released in September, but we 
overlooked that, unfortunately.

The probable solution: a point release that includes an update to Tika 1.17, 
with no other code changes.  That would be this svn version:
r1820296

Also we probably want the fix for CONNECTORS-1478 as well:
r1818722



> Some documents cannot be Tika extracted due to classloader problem
> ------------------------------------------------------------------
>
>                 Key: CONNECTORS-1481
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1481
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Tika extractor
>    Affects Versions: ManifoldCF 2.9
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 2.10
>
>
> Here's the exception:
> {code}
> FATAL 2018-01-09T10:19:54,992 (Worker thread '5') - Error tossed: 
> org.apache.poi.hwmf.record.HwmfFont.getCharSet()Lorg/apache/poi/hwmf/record/HwmfFont$WmfCharset;
> java.lang.NoSuchMethodError: 
> org.apache.poi.hwmf.record.HwmfFont.getCharSet()Lorg/apache/poi/hwmf/record/HwmfFont$WmfCharset;
>         at 
> org.apache.tika.parser.microsoft.WMFParser.parse(WMFParser.java:74) ~[?:?]
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) 
> ~[?:?]
>         at 
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) ~[?:?]
>         at 
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
>  ~[?:?]
>         at 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:375)
>  ~[?:?]
>         at 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedPart(AbstractOOXMLExtractor.java:260)
>  ~[?:?]
>         at 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:205)
>  ~[?:?]
>         at 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:142)
>  ~[?:?]
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:142)
>  ~[?:?]
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>  ~[?:?]
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[?:?]
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) 
> ~[?:?]
>         at 
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>  ~[?:?]
>         at 
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>  ~[?:?]
>         at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>  ~[mcf-agents.jar:?]
>         at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>  ~[mcf-agents.jar:?]
>         at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>  ~[mcf-agents.jar:?]
>         at 
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>  ~[mcf-agents.jar:?]
>         at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>  ~[mcf-pull-agent.jar:?]
>         at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>  ~[mcf-pull-agent.jar:?]
>         at 
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>  ~[?:?]
>         at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> {code}
> This may or may not be addressed by Tika 1.17 but nobody has tried it yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to