Gregory Lepore created TIKA-4044:
------------------------------------

             Summary: .pub files being misidentfied as MS Publisher
                 Key: TIKA-4044
                 URL: https://issues.apache.org/jira/browse/TIKA-4044
             Project: Tika
          Issue Type: Improvement
            Reporter: Gregory Lepore
         Attachments: 910409.PUB, 910486.PUB, 911541.PUB, HUBBLE.PUB, 
KEYRING.PUB, PRZ.PUB

The current tika-mimetypes.xml appears to have most .pub files identified as MS 
Publisher files, with no associated magic values. These values have been 
researched in great detail and are documented here:

[http://fileformats.archiveteam.org/wiki/Microsoft_Publisher]

 

At the very least it would be helpful to implement the magic values for 
Publisher 1.0, and adding the OLE container matching in addition to the .pub 
extension.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to