[ https://issues.apache.org/jira/browse/TIKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955047#comment-17955047 ]
Tim Allison edited comment on TIKA-4424 at 5/29/25 8:51 PM: ------------------------------------------------------------ When working in our {{branch_3x}} in the AutoDetectParserTest in tika-parsers-standard-package, if I run the following. I get the same correct application/kmz for all attempts with no exceptions. This unit test passes. I need help with a reproducer. Does this fail on a different kmz file? Do we have the same dependencies? How are you calling Tika's detect? {noformat} @Test public void testOne() throws Exception { Path kmz = Paths.get("/home/tallison/Intellij/tika-3x/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/test/resources/test" + "-documents/testKMZ.kmz"); assertEquals("application/vnd.google-earth.kmz", new Tika().detect(kmz)); try (TikaInputStream tis = TikaInputStream.get(kmz)) { assertEquals("application/vnd.google-earth.kmz", new Tika().detect(tis)); } try (TikaInputStream tis = TikaInputStream.get(Files.newInputStream(kmz))) { assertEquals("application/vnd.google-earth.kmz", new Tika().detect(tis)); } try (InputStream is = Files.newInputStream(kmz)) { assertEquals("application/vnd.google-earth.kmz", new Tika().detect(is)); } ByteArrayOutputStream bos = new ByteArrayOutputStream(); Files.copy(kmz, bos); try (InputStream is = new ByteArrayInputStream(bos.toByteArray())) { assertEquals("application/vnd.google-earth.kmz", new Tika().detect(is)); } //With name String name = kmz.getFileName().toString(); assertEquals("application/vnd.google-earth.kmz", new Tika().detect(kmz)); try (TikaInputStream tis = TikaInputStream.get(kmz)) { assertEquals("application/vnd.google-earth.kmz", new Tika().detect(tis, name)); } try (TikaInputStream tis = TikaInputStream.get(Files.newInputStream(kmz))) { assertEquals("application/vnd.google-earth.kmz", new Tika().detect(tis, name)); } try (InputStream is = Files.newInputStream(kmz)) { assertEquals("application/vnd.google-earth.kmz", new Tika().detect(is, name)); } bos = new ByteArrayOutputStream(); Files.copy(kmz, bos); try (InputStream is = new ByteArrayInputStream(bos.toByteArray())) { assertEquals("application/vnd.google-earth.kmz", new Tika().detect(is, name)); } } {noformat} was (Author: talli...@mitre.org): When working in our {{branch_3x}} in the AutoDetectParserTest in tika-parsers-standard-package, if I run the following. I get the same correct application/kmz for all attempts with no exceptions. This unit test passes. I need help with a reproducer. {noformat} @Test public void testOne() throws Exception { Path kmz = Paths.get("/home/tallison/Intellij/tika-3x/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/test/resources/test" + "-documents/testKMZ.kmz"); assertEquals("application/vnd.google-earth.kmz", new Tika().detect(kmz)); try (TikaInputStream tis = TikaInputStream.get(kmz)) { assertEquals("application/vnd.google-earth.kmz", new Tika().detect(tis)); } try (TikaInputStream tis = TikaInputStream.get(Files.newInputStream(kmz))) { assertEquals("application/vnd.google-earth.kmz", new Tika().detect(tis)); } try (InputStream is = Files.newInputStream(kmz)) { assertEquals("application/vnd.google-earth.kmz", new Tika().detect(is)); } ByteArrayOutputStream bos = new ByteArrayOutputStream(); Files.copy(kmz, bos); try (InputStream is = new ByteArrayInputStream(bos.toByteArray())) { assertEquals("application/vnd.google-earth.kmz", new Tika().detect(is)); } } {noformat} > Regression in zip-based detection with an InputStream in 3.2.0 > -------------------------------------------------------------- > > Key: TIKA-4424 > URL: https://issues.apache.org/jira/browse/TIKA-4424 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > On the user list, Craig Muchinsky and Pontus Amberg noted new problems with > detection of zip based files. > Craig noted that this affects InputStream detection, and Pontus noted that > even if he switched to a TikaInputStream, his kmz file was getting detected > as a zip. > This is Pontus' code: > {noformat} > Tike.detect(InputStream stream, String name) > {noformat} > {noformat} > pp//org.apache.tika.io.BoundedInputStream.reset(BoundedInputStream.java:115) > app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detectStreaming(DefaultZipContainerDetector.java:279) > app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detect(DefaultZipContainerDetector.java:192) > app//org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)