If you decide to modify Pdf magic, please use the one from Tika-1085 jira
issue... Thank you
Il 23/feb/2014 11:42 "Nick Burch (JIRA)" <[email protected]> ha scritto:

>
>     [
> https://issues.apache.org/jira/browse/TIKA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909739#comment-13909739]
>
> Nick Burch commented on TIKA-1245:
> ----------------------------------
>
> The reason that Tika isn't detecting this as a PDF is that there's 176
> bytes of stuff before the PDF header. Where "stuff" is some high bytes,
> then some urls. Any ideas why it might have that on the front?
>
> Other programs seem to be able to skip over that stuff, and then open the
> PDF from a few hundred bytes in, so it's possible we need to tweak the PDF
> magic to look further into the file
>
> > Incorrect MIME type detection
> > -----------------------------
> >
> >                 Key: TIKA-1245
> >                 URL: https://issues.apache.org/jira/browse/TIKA-1245
> >             Project: Tika
> >          Issue Type: Bug
> >          Components: detector, mime
> >    Affects Versions: 1.4
> >            Reporter: Mohamed Mustafa Khimani
> >            Priority: Minor
> >         Attachments: 00000001 pages 250-329.pdf
> >
> >
> > I am using Tika to detect the MIME type of a PDF file. The MIME type
> detection is incorrect.
> > Tika tika = new Tika();
> > System.out.println(tika.detect(new File(args[0])));
> > The output is - audio/mpeg
> > I was looking to attach the pdf document for testing along with this,
> but could not find how to do that.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.1.5#6160)
>

Reply via email to