Luca Moretti created TIKA-1823:
----------------------------------
Summary: Support detecting DWF format
Key: TIKA-1823
URL: https://issues.apache.org/jira/browse/TIKA-1823
Project: Tika
Issue Type: Improvement
Components: detector, mime
Affects Versions: 1.11
Reporter: Luca Moretti
Priority: Minor
Tika currently detects dwf files as application/octect-stream.
To make Tika mime magic detector correctly recognize dwf files it should be
added this code fragment in _tika-mimetypes.xml_ registry:
{code:xml}
<mime-type type="model/vnd.dwf">
<acronym>dwf</acronym>
<_comment>Design Web Format</_comment>
<magic priority="50">
<match type="string" offset="0" value="(DWF V">
<match type="string" offset="8" value=".">
<match type="string" offset="11" value=")" />
</match>
</match>
</magic>
<glob pattern="*.dwf" />
</mime-type>
{code}
\\
In current version (DWF 6.0), dwf file is a ZIP-compressed container for
vector-based CAD drawings. It is basically a ZIP archive with the _(DWF
V06.00)_ signature added before the regular ZIP magic number. For this reason,
the match value to detect dwf files should be: {{(DWF V06.00)PK}}.
In the previous versions, the dwf data transport isn't a ZIP file format, so
the magic number is only the _(DWF V00.55)_ signature in the file header.
To make Tika detect dwf files with this version too I propose the match value
in the code above.
Thanks,
Luca
\\
P.S.: The DWF format specification is included in the DWF Toolkit. The DWF
Toolkit is available for free at [http://www.autodesk.com/dwftoolkit]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)