[ https://issues.apache.org/jira/browse/TIKA-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352596#comment-16352596 ]
Hudson commented on TIKA-2564: ------------------------------ SUCCESS: Integrated in Jenkins build Tika-trunk #1431 (See [https://builds.apache.org/job/Tika-trunk/1431/]) TIKA-2564 -- wrap embedded stream in a stream that supports mark/reset (tallison: [https://github.com/apache/tika/commit/7d22ba20ae6a628a1d51b9df7b6e3dae478a119a]) * (edit) tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java * (edit) tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java * (add) tika-app/src/test/resources/test-data/test-documents.tgz > Tika client cannot extract files from embedded archive formats > -------------------------------------------------------------- > > Key: TIKA-2564 > URL: https://issues.apache.org/jira/browse/TIKA-2564 > Project: Tika > Issue Type: Bug > Environment: Mac OS 10.13.3 (17D47) > > 17:42 ext$ java -version > java version "9.0.1" > Java(TM) SE Runtime Environment (build 9.0.1+11) > Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode) > 17:42 ext$ uname -a > Darwin bix.local 17.4.0 Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST > 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 x86_64 > > > Reporter: Marc Prud'hommeaux > Assignee: Tim Allison > Priority: Major > Fix For: 1.18, 2.0.0 > > > > This may be related to TIKA-2395. When trying to extract the files from > tika/tika-parsers/src/test/resources/test-documents/test-documents.tgz > > % coursier launch org.apache.tika:tika-app:1.17 --main > org.apache.tika.cli.TikaCLI -- --extract test-documents.tgz > I see the exception: > > Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: > Illegal IOException from org.apache.tika.parser.pkg.CompressorParser@62628e78 > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:205) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:486) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at coursier.cli.qR.a(Unknown Source) > at coursier.cli.qQ.j(Unknown Source) > at coursier.cli.qW.a(Unknown Source) > at d.h.a.c(Unknown Source) > at b.b.c_(Unknown Source) > at d.b.d.E.g(Unknown Source) > at d.b.e.aW.g(Unknown Source) > at d.b.f.b.aa.a(Unknown Source) > at coursier.cli.qQ.b(Unknown Source) > at coursier.cli.Q.b(Unknown Source) > at b.J.c_(Unknown Source) > at d.F.h(Unknown Source) > at b.F.a(Unknown Source) > at coursier.cli.Coursier.main(Unknown Source) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at coursier.Bootstrap.main(Bootstrap.java:428) > Caused by: java.io.IOException: mark/reset not supported > at java.base/java.io.InputStream.reset(InputStream.java:474) > at > org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:444) > at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) > at > org.apache.tika.cli.TikaCLI$FileEmbeddedDocumentExtractor.parseEmbedded(TikaCLI.java:1045) > at > org.apache.tika.parser.pkg.CompressorParser.parse(CompressorParser.java:222) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ... 28 more > > However, I can browse the document fine using: > > % coursier launch org.apache.tika:tika-app:1.17 --main > org.apache.tika.cli.TikaCLI -- test-documents.tgz > > This issue affects: test-documents.rar, test-documents.tar.Z, > test-documents.tbz2, and test-documents.tgz > But it does not affect test-documents.7z, test-documents.cab, > test-documents.ddf, test-documents.dmg, test-documents.tar, or > test-documents.zip > > > This makes me suspect that it has something to do with extracting files from > packages that are embedded in other archive parsers. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)