[ 
https://issues.apache.org/jira/browse/TIKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955047#comment-17955047
 ] 

Tim Allison edited comment on TIKA-4424 at 5/29/25 8:51 PM:
------------------------------------------------------------

When working in our {{branch_3x}} in the AutoDetectParserTest in 
tika-parsers-standard-package, if I run the following.

I get the same correct application/kmz for all attempts with no exceptions. 
This unit test passes.

I need help with a reproducer. Does this fail on a different kmz file? Do we 
have the same dependencies? How are you calling Tika's detect?

{noformat}
    @Test
    public void testOne() throws Exception {
        Path kmz = 
Paths.get("/home/tallison/Intellij/tika-3x/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/test/resources/test"
 +
                "-documents/testKMZ.kmz");

        assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(kmz));

        try (TikaInputStream tis = TikaInputStream.get(kmz)) {
            assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(tis));
        }

        try (TikaInputStream tis = 
TikaInputStream.get(Files.newInputStream(kmz))) {
            assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(tis));
        }

        try (InputStream is = Files.newInputStream(kmz)) {
            assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(is));
        }

        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        Files.copy(kmz, bos);
        try (InputStream is = new ByteArrayInputStream(bos.toByteArray())) {
            assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(is));
        }

        //With name
        String name = kmz.getFileName().toString();
        assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(kmz));

        try (TikaInputStream tis = TikaInputStream.get(kmz)) {
            assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(tis, name));
        }

        try (TikaInputStream tis = 
TikaInputStream.get(Files.newInputStream(kmz))) {
            assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(tis, name));
        }

        try (InputStream is = Files.newInputStream(kmz)) {
            assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(is, name));
        }

        bos = new ByteArrayOutputStream();
        Files.copy(kmz, bos);
        try (InputStream is = new ByteArrayInputStream(bos.toByteArray())) {
            assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(is, name));
        }

    }
{noformat}


was (Author: talli...@mitre.org):
When working in our {{branch_3x}} in the AutoDetectParserTest in 
tika-parsers-standard-package, if I run the following.

I get the same correct application/kmz for all attempts with no exceptions. 
This unit test passes.

I need help with a reproducer.

{noformat}
    @Test
    public void testOne() throws Exception {
        Path kmz = 
Paths.get("/home/tallison/Intellij/tika-3x/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/test/resources/test"
 +
                "-documents/testKMZ.kmz");

        assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(kmz));

        try (TikaInputStream tis = TikaInputStream.get(kmz)) {
            assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(tis));
        }

        try (TikaInputStream tis = 
TikaInputStream.get(Files.newInputStream(kmz))) {
            assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(tis));
        }

        try (InputStream is = Files.newInputStream(kmz)) {
            assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(is));
        }

        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        Files.copy(kmz, bos);
        try (InputStream is = new ByteArrayInputStream(bos.toByteArray())) {
            assertEquals("application/vnd.google-earth.kmz", new 
Tika().detect(is));
        }

    }
{noformat}

> Regression in zip-based detection with an InputStream in 3.2.0
> --------------------------------------------------------------
>
>                 Key: TIKA-4424
>                 URL: https://issues.apache.org/jira/browse/TIKA-4424
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> On the user list, Craig Muchinsky and Pontus Amberg noted new problems with 
> detection of zip based files.
> Craig noted that this affects InputStream detection, and Pontus noted that 
> even if he switched to a TikaInputStream, his kmz file was getting detected 
> as a zip.
> This is Pontus' code:
> {noformat}
> Tike.detect(InputStream stream, String name)
> {noformat}
> {noformat}
> pp//org.apache.tika.io.BoundedInputStream.reset(BoundedInputStream.java:115)
> app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detectStreaming(DefaultZipContainerDetector.java:279)
> app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detect(DefaultZipContainerDetector.java:192)
> app//org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to