[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770245#comment-15770245
]
Tim Allison commented on TIKA-1946:
-----------------------------------
Thank you, Nick! I found that we can specify version info in the magic and in
the supported types. The soon-to-be-committed version correctly detects 5.0,
5.1 and 6.x, and sends 6.x to the WordPerfectParser and 5.0 and 5.1 to the
EmptyParser. The WordPerfectParser has a check for 6.x and throws an
UnsupportedFormatException if it somehow winds up with the wrong file type.
{noformat}
<mime-type type="application/vnd.wordperfect">
<acronym>WPD</acronym>
<_comment>WordPerfect - Corel Word Processing</_comment>
<tika:link>http://en.wikipedia.org/wiki/WordPerfect</tika:link>
<tika:uti>com.corel.wordperfect.doc</tika:uti>
<magic priority="50">
<match value="application/vnd.wordperfect;" type="string" offset="0"/>
</magic>
<magic priority="40">
<match value="0xFF575043" type="big32" offset="0"/> <!-- ÿWPC -->
</magic>
<!-- We have magic coverage for these, so we shouldn't need them
<glob pattern="*.wpd"/>
<glob pattern="*.wp"/>
<glob pattern="*.wp5"/>
<glob pattern="*.wp6"/>
<glob pattern="*.w60"/>
<glob pattern="*.wp61"/>
<glob pattern="*.wpt"/>
-->
</mime-type>
<mime-type type="application/vnd.wordperfect;version=5.0">
<sub-class-of type="application/vnd.wordperfect"/>
<magic priority="50">
<match value="0xFF575043" type="big32" offset="0"> <!-- ÿWPC -->
<match value="0x0000" type="big16" offset="10"/>
</match>
</magic>
</mime-type>
<mime-type type="application/vnd.wordperfect;version=5.1">
<sub-class-of type="application/vnd.wordperfect"/>
<magic priority="50">
<match value="0xFF575043" type="big32" offset="0"> <!-- ÿWPC -->
<match value="0x0001" type="big16" offset="10"/>
</match>
</magic>
</mime-type>
<mime-type type="application/vnd.wordperfect;version=6.x">
<!--TODO: figure out how to distinguish 6.x versions -->
<sub-class-of type="application/vnd.wordperfect"/>
<magic priority="50">
<match value="0xFF575043" type="big32" offset="0"> <!-- ÿWPC -->
<match value="0x0201" type="big16" offset="10"/>
</match>
</magic>
</mime-type>
{noformat}
Does this work?
> Add mime detection and parser for WordPerfect
> ---------------------------------------------
>
> Key: TIKA-1946
> URL: https://issues.apache.org/jira/browse/TIKA-1946
> Project: Tika
> Issue Type: Improvement
> Components: mime, parser
> Reporter: Nick C
> Fix For: 2.0, 1.15
>
> Attachments: TIKA-1946-pascal.essiembre-01.patch,
> wordperfect_mimes_fuller.zip, wordperfect_signatures_by_versions.xlsx
>
>
> I noticed some code on github for parsing WordPerfect files
> (https://github.com/Norconex/importer) Also looks like the author
> [~pascal.essiembre] has contributed to Tika before
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)