[ 
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770245#comment-15770245
 ] 

Tim Allison commented on TIKA-1946:
-----------------------------------

Thank you, Nick! I found that we can specify version info in the magic and in 
the supported types.  The soon-to-be-committed version correctly detects 5.0, 
5.1 and 6.x, and sends 6.x to the WordPerfectParser and 5.0 and 5.1 to the 
EmptyParser.  The WordPerfectParser has a check for 6.x and throws an 
UnsupportedFormatException if it somehow winds up with the wrong file type.

{noformat}
  <mime-type type="application/vnd.wordperfect">
    <acronym>WPD</acronym>
    <_comment>WordPerfect - Corel Word Processing</_comment>
    <tika:link>http://en.wikipedia.org/wiki/WordPerfect</tika:link>
    <tika:uti>com.corel.wordperfect.doc</tika:uti>
    <magic priority="50">
      <match value="application/vnd.wordperfect;" type="string" offset="0"/>
    </magic>
    <magic priority="40">
      <match value="0xFF575043" type="big32" offset="0"/> <!-- ÿWPC -->
    </magic>
<!-- We have magic coverage for these, so we shouldn't need them
        <glob pattern="*.wpd"/>
    <glob pattern="*.wp"/>
    <glob pattern="*.wp5"/>
    <glob pattern="*.wp6"/>
    <glob pattern="*.w60"/>
    <glob pattern="*.wp61"/>
    <glob pattern="*.wpt"/>
    -->
  </mime-type>
  <mime-type type="application/vnd.wordperfect;version=5.0">
    <sub-class-of type="application/vnd.wordperfect"/>
    <magic priority="50">
      <match value="0xFF575043" type="big32" offset="0"> <!-- ÿWPC -->
        <match value="0x0000" type="big16" offset="10"/>
      </match>
    </magic>
  </mime-type>
  <mime-type type="application/vnd.wordperfect;version=5.1">
    <sub-class-of type="application/vnd.wordperfect"/>
    <magic priority="50">
      <match value="0xFF575043" type="big32" offset="0"> <!-- ÿWPC -->
        <match value="0x0001" type="big16" offset="10"/>
      </match>
    </magic>
  </mime-type>
  <mime-type type="application/vnd.wordperfect;version=6.x">
    <!--TODO: figure out how to distinguish 6.x versions -->
    <sub-class-of type="application/vnd.wordperfect"/>
    <magic priority="50">
      <match value="0xFF575043" type="big32" offset="0"> <!-- ÿWPC -->
        <match value="0x0201" type="big16" offset="10"/>
      </match>
    </magic>
  </mime-type>
{noformat}

Does this work?


> Add mime detection and parser for WordPerfect
> ---------------------------------------------
>
>                 Key: TIKA-1946
>                 URL: https://issues.apache.org/jira/browse/TIKA-1946
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime, parser
>            Reporter: Nick C
>             Fix For: 2.0, 1.15
>
>         Attachments: TIKA-1946-pascal.essiembre-01.patch, 
> wordperfect_mimes_fuller.zip, wordperfect_signatures_by_versions.xlsx
>
>
> I noticed some code on github for parsing WordPerfect files 
> (https://github.com/Norconex/importer) Also looks like the author 
> [~pascal.essiembre] has contributed to Tika before



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to