If you decide to modify Pdf magic, please use the one from Tika-1085 jira issue... Thank you Il 23/feb/2014 11:42 "Nick Burch (JIRA)" <[email protected]> ha scritto:
> > [ > https://issues.apache.org/jira/browse/TIKA-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909739#comment-13909739] > > Nick Burch commented on TIKA-1245: > ---------------------------------- > > The reason that Tika isn't detecting this as a PDF is that there's 176 > bytes of stuff before the PDF header. Where "stuff" is some high bytes, > then some urls. Any ideas why it might have that on the front? > > Other programs seem to be able to skip over that stuff, and then open the > PDF from a few hundred bytes in, so it's possible we need to tweak the PDF > magic to look further into the file > > > Incorrect MIME type detection > > ----------------------------- > > > > Key: TIKA-1245 > > URL: https://issues.apache.org/jira/browse/TIKA-1245 > > Project: Tika > > Issue Type: Bug > > Components: detector, mime > > Affects Versions: 1.4 > > Reporter: Mohamed Mustafa Khimani > > Priority: Minor > > Attachments: 00000001 pages 250-329.pdf > > > > > > I am using Tika to detect the MIME type of a PDF file. The MIME type > detection is incorrect. > > Tika tika = new Tika(); > > System.out.println(tika.detect(new File(args[0]))); > > The output is - audio/mpeg > > I was looking to attach the pdf document for testing along with this, > but could not find how to do that. > > > > -- > This message was sent by Atlassian JIRA > (v6.1.5#6160) >
