[
https://issues.apache.org/jira/browse/TIKA-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marco Quaranta updated TIKA-1204:
---------------------------------
Attachment: General assembly filter.dwfx
DWFX test file
> DWFX files detection
> --------------------
>
> Key: TIKA-1204
> URL: https://issues.apache.org/jira/browse/TIKA-1204
> Project: Tika
> Issue Type: Improvement
> Components: detector, mime
> Affects Versions: 1.4
> Reporter: Marco Quaranta
> Priority: Minor
> Attachments: General assembly filter.dwfx
>
>
> DWFX are AutoCAD [Design web
> format|http://en.wikipedia.org/wiki/Design_Web_Format] files and follow [Open
> Packaging
> Conventions|http://en.wikipedia.org/wiki/Open_Packaging_Conventions].
> Tika "correctly" detects these files as application/zip.
> It would be better if Tika could recognize the true mimetype:
> model/vnd.dwfx+xps. (y)
> Please add logic in ZipContainerDetector in such a way could be possible to
> detect dwfx. We need a method behaving like detectOfficeOpenXML(OPCPackage
> pkg):
> {noformat}
> PackageRelationshipCollection core =
> pkg.getRelationshipsByType("http://schemas.autodesk.com/dwfx/2007/relationships/documentsequence");
> if (core.size() != 1) {
> // Invalid DWFX Package received
> return null;
> }
> PackagePart corePart = pkg.getPart(core.getRelationship(0));
> String coreType = corePart.getContentType();
> return MediaType.parse(coreType);
> {noformat}
> Thank you,
> Marco
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)