Gregory Lepore created TIKA-4044:
------------------------------------
Summary: .pub files being misidentfied as MS Publisher
Key: TIKA-4044
URL: https://issues.apache.org/jira/browse/TIKA-4044
Project: Tika
Issue Type: Improvement
Reporter: Gregory Lepore
Attachments: 910409.PUB, 910486.PUB, 911541.PUB, HUBBLE.PUB,
KEYRING.PUB, PRZ.PUB
The current tika-mimetypes.xml appears to have most .pub files identified as MS
Publisher files, with no associated magic values. These values have been
researched in great detail and are documented here:
[http://fileformats.archiveteam.org/wiki/Microsoft_Publisher]
At the very least it would be helpful to implement the magic values for
Publisher 1.0, and adding the OLE container matching in addition to the .pub
extension.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)