[
https://issues.apache.org/jira/browse/TIKA-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532986#comment-15532986
]
Tim Allison commented on TIKA-2103:
-----------------------------------
Y, the POIFSContainerDetector lives in tika-parsers, so you'll need that. And,
right, if you include the filename parameter or call {{tika.detect(file)}}, it
uses the filename suffix to determine the subtype of application/zip.
In general, using TikaInputStream can be more efficient, especially with
{{File}} or a {{Path}}.
InputStream tis = TikaInputStream.get(File).
> xlsx inputstreams or byte arrays are detected as application/zip
> ----------------------------------------------------------------
>
> Key: TIKA-2103
> URL: https://issues.apache.org/jira/browse/TIKA-2103
> Project: Tika
> Issue Type: Bug
> Components: core
> Affects Versions: 1.13
> Reporter: özay duman
> Priority: Minor
>
> detect method of org.apache.tika.Tika recognizes byte[] and InputStream as
> zip.
> Tika tika = new Tika();
> Path path = Paths.get("C:/abc.xlsx");
> byte[] data = Files.readAllBytes(path);
> String detectType = tika.detect(data);
> System.err.println("Detected type" + detectType);
> prints : Detected typeapplication/zip
> Tika tika = new Tika();
> InputStream targetStream = new FileInputStream(new File("C:/abc.xlsx"));
> String detectType = tika.detect(targetStream);
> System.err.println("Detected type" + detectType);
> prints : Detected typeapplication/zip
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)