Marco Quaranta created TIKA-1204:
------------------------------------

             Summary: DWFX files detection
                 Key: TIKA-1204
                 URL: https://issues.apache.org/jira/browse/TIKA-1204
             Project: Tika
          Issue Type: Improvement
          Components: detector, mime
    Affects Versions: 1.4
            Reporter: Marco Quaranta
            Priority: Minor


DWFX are AutoCAD [Design web 
format|http://en.wikipedia.org/wiki/Design_Web_Format] files and follow [Open 
Packaging Conventions|http://en.wikipedia.org/wiki/Open_Packaging_Conventions]. 
Tika "correctly" detects these files as application/zip. 
It would be better if Tika could recognize the true mimetype: 
model/vnd.dwfx+xps. (y)
Please add logic in ZipContainerDetector in such a way could be possible to 
detect dwfx. We need a method behaving like detectOfficeOpenXML(OPCPackage 
pkg): 

{noformat}
PackageRelationshipCollection core = 
pkg.getRelationshipsByType("http://schemas.autodesk.com/dwfx/2007/relationships/documentsequence";);
if (core.size() != 1) {
 // Invalid DWFX Package received
 return null;
}
PackagePart corePart = pkg.getPart(core.getRelationship(0));
String coreType = corePart.getContentType();
return MediaType.parse(coreType);
{noformat}
Thank you,
Marco




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to