[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768146#comment-15768146 ]
Pascal Essiembre edited comment on TIKA-1946 at 12/21/16 9:02 PM: ------------------------------------------------------------------ WordPerfect extensions vary quite a bit. But the parser I wrote is based on WP Version 6. I suspect it supports higher versions as well, but definitely not lower. According to this document, http://www.corel.com/content/pdf/wpx4/corel-wordperfect-office-X4-reviewers-guide.pdf .wp extensions can be for both WP5.x and WP6.x so we can't rely on extension as an indicator of anything. Since the major version should definitely be 2, I agree we should use that to throw an exception when not that. I could not find enough evidence in my earlier research of older version signatures, other than 0xD0CF11E0A1B11AE1 for some, which conflicts with MS Word (maybe why it can open those). I wonder if we should remove or have separate entries for these mime times then in tika-miketypes.xml? {code:xml} <alias type="application/wordperfect"/> <alias type="application/wordperfect5.1"/> {code} That's assuming application/wordperfect is for older versions as well. was (Author: pascal.essiembre): WordPerfect extensions vary quite a bit. But the parser I wrote is based on WP Version 6. I suspect it supports higher versions as well, but definitely not lower. According to this document, http://www.corel.com/content/pdf/wpx4/corel-wordperfect-office-X4-reviewers-guide.pdf .wp extensions can be for both WP5.x and WP6.x so we can't rely on extension as an indicator of anything. Since the major version should definitely be 2, I agree we should use that to throw an exception when not that. I could not find enough evidence of earlier version signatures, other than 0xD0CF11E0A1B11AE1 for some, which conflicts with MS Word (maybe why it can open those). I wonder if we should remove or have separate entries for these mime times then in tika-miketypes.xml? {code:xml} <alias type="application/wordperfect"/> <alias type="application/wordperfect5.1"/> {code} That's assuming application/wordperfect is for older versions as well. > Add mime detection and parser for WordPerfect > --------------------------------------------- > > Key: TIKA-1946 > URL: https://issues.apache.org/jira/browse/TIKA-1946 > Project: Tika > Issue Type: Improvement > Components: mime, parser > Reporter: Nick C > Fix For: 2.0, 1.15 > > > I noticed some code on github for parsing WordPerfect files > (https://github.com/Norconex/importer) Also looks like the author > [~pascal.essiembre] has contributed to Tika before -- This message was sent by Atlassian JIRA (v6.3.4#6332)