[
https://issues.apache.org/jira/browse/TIKA-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883159#comment-16883159
]
Tim Allison edited comment on TIKA-2904 at 7/11/19 4:40 PM:
------------------------------------------------------------
We didn't upgrade to 4.1.0 because there were some serious regressions in text
extraction in WMF.
There are some serious binary incompatibilities btwn 4.1.0 and 4.0.1.
I raised these issues with the POI team at the time:
https://mail-archives.apache.org/mod_mbox/poi-dev/201904.mbox/%3CCAC1dCwX%3Dz9XOPcBLHEsE_TAoH6gSFCR%2B_Uu%3Dw-2pPSpiz7LNDQ%40mail.gmail.com%3E
and
https://mail-archives.apache.org/mod_mbox/poi-dev/201904.mbox/%3CCAC1dCwW4gmMSMjVLnXS1MCf2E2UVZP3Dsk9yx3TO-cJrYRT_fw%40mail.gmail.com%3E
I'm on the POI team and chose not to vote against the release. I need to work
with the latest version to try to address the EMF extraction issues, if they
haven't been OBE by now, and then work towards a new release.
But, you're right, 4.0.1 and 4.1.0 are _NOT_ compatible.
was (Author: [email protected]):
We didn't upgrade to 4.1.0 because there were some serious regressions in text
extraction in WMF.
There are some serious binary incompatibilities btwn 4.1.0 and 4.0.1.
I raised these issues with the POI team at the time:
https://mail-archives.apache.org/mod_mbox/poi-dev/201904.mbox/%3CCAC1dCwX%3Dz9XOPcBLHEsE_TAoH6gSFCR%2B_Uu%3Dw-2pPSpiz7LNDQ%40mail.gmail.com%3E
and
https://mail-archives.apache.org/mod_mbox/poi-dev/201904.mbox/%3CCAC1dCwW4gmMSMjVLnXS1MCf2E2UVZP3Dsk9yx3TO-cJrYRT_fw%40mail.gmail.com%3E
I'm on the POI team and chose not to vote against the release. I need to work
with the latest version to try to address the EMF extraction issues, if they
haven't been OBE by now, and then work towards a new release.
But, you're right, 4.0.1 and 4.1.0 are _NOT_ compatible.
I still need to
> Error parsing a Word document with a WMF image
> ----------------------------------------------
>
> Key: TIKA-2904
> URL: https://issues.apache.org/jira/browse/TIKA-2904
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.21
> Reporter: Borja Serrano
> Priority: Major
>
> If you try to parse a document with a WMF file and you are importing the
> newest version of Apache POI (4.1.0 which is marked as compatible) you get a
> NoSuchMethodError exception:
> {code:java}
> 2019-07-11 11:06:59 com.penman.web.configuration.CustomAsyncExceptionHandler
> [ERROR] Exception in async task message -
> org.apache.poi.hwmf.record.HwmfRecord.getRecordType()Lorg/apache/poi/hwmf/record/HwmfRecordType;
> java.lang.NoSuchMethodError:
> org.apache.poi.hwmf.record.HwmfRecord.getRecordType()Lorg/apache/poi/hwmf/record/HwmfRecordType;
> at org.apache.tika.parser.microsoft.WMFParser.parse(WMFParser.java:72)
> ~[tika-parsers-1.21.jar:1.21]
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[tika-core-1.21.jar:1.21]
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[tika-core-1.21.jar:1.21]
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> ~[tika-core-1.21.jar:1.21]
> at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
> ~[tika-core-1.21.jar:1.21]
> at
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:104)
> ~[tika-core-1.21.jar:1.21]
> at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:391)
> ~[tika-parsers-1.21.jar:1.21]
> at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedPart(AbstractOOXMLExtractor.java:264)
> ~[tika-parsers-1.21.jar:1.21]
> at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:206)
> ~[tika-parsers-1.21.jar:1.21]
> at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:139)
> ~[tika-parsers-1.21.jar:1.21]
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:201)
> ~[tika-parsers-1.21.jar:1.21]
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:110)
> ~[tika-parsers-1.21.jar:1.21]
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[tika-core-1.21.jar:1.21]
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[tika-core-1.21.jar:1.21]
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> ~[tika-core-1.21.jar:1.21]
> {code}
> The problem comes from an update in Apache POI. Since 4.1.0 the function
> getRecordType is no longer usable and we need to use getWmfRecordType (there
> was a discussion about the change in
> [http://apache-poi.1045710.n5.nabble.com/VOTE-Apache-POI-4-1-0-release-RC3-td5733174.html])
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)