[ 
https://issues.apache.org/jira/browse/TIKA-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532986#comment-15532986
 ] 

Tim Allison commented on TIKA-2103:
-----------------------------------

Y, the POIFSContainerDetector lives in tika-parsers, so you'll need that. And, 
right, if you include the filename parameter or call {{tika.detect(file)}}, it 
uses the filename suffix to determine the subtype of application/zip.

In general, using TikaInputStream can be more efficient, especially with 
{{File}} or a {{Path}}.

InputStream tis = TikaInputStream.get(File).

> xlsx inputstreams or byte arrays are detected as application/zip
> ----------------------------------------------------------------
>
>                 Key: TIKA-2103
>                 URL: https://issues.apache.org/jira/browse/TIKA-2103
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.13
>            Reporter: özay duman
>            Priority: Minor
>
> detect method of org.apache.tika.Tika recognizes byte[] and InputStream as 
> zip.
> Tika tika = new Tika();
> Path path = Paths.get("C:/abc.xlsx");
> byte[] data = Files.readAllBytes(path);
> String detectType = tika.detect(data);
> System.err.println("Detected type" + detectType);
> prints : Detected typeapplication/zip
> Tika tika = new Tika();
> InputStream targetStream = new FileInputStream(new File("C:/abc.xlsx"));
> String detectType = tika.detect(targetStream);
> System.err.println("Detected type" + detectType);
> prints : Detected typeapplication/zip



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to