[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768146#comment-15768146
]
Pascal Essiembre commented on TIKA-1946:
----------------------------------------
WordPerfect extensions vary quite a bit. But the parser I wrote is based on WP
Version 6. I suspect it supports higher versions as well, but definitely now
lower. According to this document,
http://www.corel.com/content/pdf/wpx4/corel-wordperfect-office-X4-reviewers-guide.pdf
.wp extensions can be for both WP5.x and WP6.x so we can't rely on extension
as an indicator of anything. Since the major version should definitely be 2,
I agree we should use that to throw an exception when not that. I could not
find enough evidence of earlier version signatures, other than
0xD0CF11E0A1B11AE1 for some, which conflicts with MS Word (maybe why it can
open those). I wonder if we should remove or have separate entries for these
mime times then in tika-miketypes.xml?
{code:xml}
<alias type="application/wordperfect"/>
<alias type="application/wordperfect5.1"/>
{code}
That's assuming application/wordperfect is for older versions as well.
> Add mime detection and parser for WordPerfect
> ---------------------------------------------
>
> Key: TIKA-1946
> URL: https://issues.apache.org/jira/browse/TIKA-1946
> Project: Tika
> Issue Type: Improvement
> Components: mime, parser
> Reporter: Nick C
> Fix For: 2.0, 1.15
>
>
> I noticed some code on github for parsing WordPerfect files
> (https://github.com/Norconex/importer) Also looks like the author
> [~pascal.essiembre] has contributed to Tika before
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)