[ 
https://issues.apache.org/jira/browse/TIKA-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301042#comment-17301042
 ] 

Hudson commented on TIKA-3316:
------------------------------

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #167 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/167/])
TIKA-3316 -- improve XPS parser to include open XPS and allow for streaming 
zips with data descriptors (tallison: 
[https://github.com/apache/tika/commit/cba0372821022833a9c976bd47bd67193f73f635])
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSParserTest.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xps/XPSExtractorDecorator.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java
* (add) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testXPSWithDataDescriptor.xps
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-zip-commons/src/main/java/org/apache/tika/zip/utils/ZipSalvager.java
* (add) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testXPSWithDataDescriptor2.xps
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/detect/microsoft/ooxml/OPCPackageDetector.java
* (edit) 
tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-zip-commons/src/main/java/org/apache/tika/detect/zip/DefaultZipContainerDetector.java


> Illegal IOException processing XPS files
> ----------------------------------------
>
>                 Key: TIKA-3316
>                 URL: https://issues.apache.org/jira/browse/TIKA-3316
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.25
>            Reporter: Nick Harmer
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 1.26
>
>         Attachments: Screenshot from 2021-03-12 17-00-05.png, test1.xps, 
> test2.xps, test3.xps, test4.xps
>
>
> I have a number of (relatively simple) XPS documents which Tika fails to 
> process.  The following exception appears:
> {code:java}
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser@4149c063
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
>         at com.mcms.Main.parseFile(Main.java:88)
>         at com.mcms.Main.main(Main.java:59)
> Caused by: 
> org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException: 
> Unsupported feature data descriptor used in entry 
> Documents/1/Metadata/Page1_Thumbnail.JPG
>         at 
> org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.read(ZipArchiveInputStream.java:477)
>         at java.base/java.io.FilterInputStream.read(Unknown Source)
>         at 
> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.read(ZipArchiveThresholdInputStream.java:80)
>         at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:182)
>         at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:149)
>         at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:136)
>         at 
> org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:47)
>         at 
> org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:53)
>         at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:106)
>         at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:307)
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:111)
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:113)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         ... 5 more
> {code}
>  
> Obviously the generator for these files (XPS printer driver from Notepad) 
> adds a per-page thumbnail image which Tika doesn't like.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to