[jira] [Commented] (TIKA-1245) Incorrect MIME type detection

Nick Burch (JIRA) Sun, 23 Feb 2014 02:43:23 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909739#comment-13909739
 ]


Nick Burch commented on TIKA-1245:
----------------------------------

The reason that Tika isn't detecting this as a PDF is that there's 176 bytes of 
stuff before the PDF header. Where "stuff" is some high bytes, then some urls. 
Any ideas why it might have that on the front?

Other programs seem to be able to skip over that stuff, and then open the PDF 
from a few hundred bytes in, so it's possible we need to tweak the PDF magic to 
look further into the file

> Incorrect MIME type detection
> -----------------------------
>
>                 Key: TIKA-1245
>                 URL: https://issues.apache.org/jira/browse/TIKA-1245
>             Project: Tika
>          Issue Type: Bug
>          Components: detector, mime
>    Affects Versions: 1.4
>            Reporter: Mohamed Mustafa Khimani
>            Priority: Minor
>         Attachments: 00000001 pages 250-329.pdf
>
>
> I am using Tika to detect the MIME type of a PDF file. The MIME type 
> detection is incorrect. 
> Tika tika = new Tika();
> System.out.println(tika.detect(new File(args[0])));
> The output is - audio/mpeg
> I was looking to attach the pdf document for testing along with this, but 
> could not find how to do that.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (TIKA-1245) Incorrect MIME type detection

Reply via email to