[ 
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768146#comment-15768146
 ] 

Pascal Essiembre edited comment on TIKA-1946 at 12/21/16 9:02 PM:
------------------------------------------------------------------

WordPerfect extensions vary quite a bit.  But the parser I wrote is based on WP 
Version 6.  I suspect it supports higher versions as well, but definitely not 
lower.  According to this document, 
http://www.corel.com/content/pdf/wpx4/corel-wordperfect-office-X4-reviewers-guide.pdf
 .wp extensions can be for both WP5.x and WP6.x so we can't rely on extension 
as an indicator of anything.   Since the major version should definitely be 2, 
I agree we should use that to throw an exception when not that.   I could not 
find enough evidence in my earlier research of older version signatures, other 
than 0xD0CF11E0A1B11AE1 for some, which conflicts with MS Word (maybe why it 
can open those).  I wonder if we should remove or have separate entries for 
these mime times then in tika-miketypes.xml?
{code:xml}
    <alias type="application/wordperfect"/>
    <alias type="application/wordperfect5.1"/>
{code}

That's assuming application/wordperfect is for older versions as well.



was (Author: pascal.essiembre):
WordPerfect extensions vary quite a bit.  But the parser I wrote is based on WP 
Version 6.  I suspect it supports higher versions as well, but definitely not 
lower.  According to this document, 
http://www.corel.com/content/pdf/wpx4/corel-wordperfect-office-X4-reviewers-guide.pdf
 .wp extensions can be for both WP5.x and WP6.x so we can't rely on extension 
as an indicator of anything.   Since the major version should definitely be 2, 
I agree we should use that to throw an exception when not that.   I could not 
find enough evidence of earlier version signatures, other than 
0xD0CF11E0A1B11AE1 for some, which conflicts with MS Word (maybe why it can 
open those).  I wonder if we should remove or have separate entries for these 
mime times then in tika-miketypes.xml?
{code:xml}
    <alias type="application/wordperfect"/>
    <alias type="application/wordperfect5.1"/>
{code}

That's assuming application/wordperfect is for older versions as well.


> Add mime detection and parser for WordPerfect
> ---------------------------------------------
>
>                 Key: TIKA-1946
>                 URL: https://issues.apache.org/jira/browse/TIKA-1946
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime, parser
>            Reporter: Nick C
>             Fix For: 2.0, 1.15
>
>
> I noticed some code on github for parsing WordPerfect files 
> (https://github.com/Norconex/importer) Also looks like the author 
> [~pascal.essiembre] has contributed to Tika before



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to