[ 
https://issues.apache.org/jira/browse/TIKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956997#comment-17956997
 ] 

Tim Allison commented on TIKA-4424:
-----------------------------------

As a side note, the KMZDetector when operating on a file required a single file 
with name ending in ".kml" at the root level of the kmz file (following 
[https://developers.google.com/kml/documentation/kmzarchives|https://developers.google.com/kml/documentation/kmzarchives).]
 ). If there were other files at the root level (as in the test file attached 
here), this was detected as application/zip.

When streaming however, the only requirement was one kml file at the root 
level, but there could be other files at the root level. So, {_}I think{_}, 
this file would have been detected as {{application/zip}} if file detection 
were used, but {{application/vnd.google-earth.kmz}} if streaming.

 

> Regression in zip-based detection with an InputStream in 3.2.0
> --------------------------------------------------------------
>
>                 Key: TIKA-4424
>                 URL: https://issues.apache.org/jira/browse/TIKA-4424
>             Project: Tika
>          Issue Type: Task
>          Components: detector
>    Affects Versions: 3.2.0
>            Reporter: Tim Allison
>            Priority: Major
>              Labels: regression
>             Fix For: 4.0.0, 3.2.1
>
>         Attachments: tika-4424.zip
>
>
> On the user list, Craig Muchinsky and Pontus Amberg noted new problems with 
> detection of zip based files.
> Craig noted that this affects InputStream detection, and Pontus noted that 
> even if he switched to a TikaInputStream, his kmz file was getting detected 
> as a zip.
> This is Pontus' code:
> {noformat}
> Tike.detect(InputStream stream, String name)
> {noformat}
> {noformat}
> pp//org.apache.tika.io.BoundedInputStream.reset(BoundedInputStream.java:115)
> app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detectStreaming(DefaultZipContainerDetector.java:279)
> app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detect(DefaultZipContainerDetector.java:192)
> app//org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to