Marco Quaranta created TIKA-1204:
------------------------------------
Summary: DWFX files detection
Key: TIKA-1204
URL: https://issues.apache.org/jira/browse/TIKA-1204
Project: Tika
Issue Type: Improvement
Components: detector, mime
Affects Versions: 1.4
Reporter: Marco Quaranta
Priority: Minor
DWFX are AutoCAD [Design web
format|http://en.wikipedia.org/wiki/Design_Web_Format] files and follow [Open
Packaging Conventions|http://en.wikipedia.org/wiki/Open_Packaging_Conventions].
Tika "correctly" detects these files as application/zip.
It would be better if Tika could recognize the true mimetype:
model/vnd.dwfx+xps. (y)
Please add logic in ZipContainerDetector in such a way could be possible to
detect dwfx. We need a method behaving like detectOfficeOpenXML(OPCPackage
pkg):
{noformat}
PackageRelationshipCollection core =
pkg.getRelationshipsByType("http://schemas.autodesk.com/dwfx/2007/relationships/documentsequence");
if (core.size() != 1) {
// Invalid DWFX Package received
return null;
}
PackagePart corePart = pkg.getPart(core.getRelationship(0));
String coreType = corePart.getContentType();
return MediaType.parse(coreType);
{noformat}
Thank you,
Marco
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)