[ https://issues.apache.org/jira/browse/TIKA-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158632#comment-16158632 ]
Nick Burch commented on TIKA-2461: ---------------------------------- Assuming you have the Tika App jar to hand, you can just run it with {{java -classpath tika-app-1.16.jar org.apache.poi.poifs.dev.POIFSLister FullFile.wpd}} > Wordperfect file identified as Quattro Pro document > --------------------------------------------------- > > Key: TIKA-2461 > URL: https://issues.apache.org/jira/browse/TIKA-2461 > Project: Tika > Issue Type: Bug > Components: detector > Affects Versions: 1.16 > Environment: Linux Mint 17 > Reporter: Johan van der Knijff > Priority: Minor > > While running Tika 1.16 in detect mode over some legacy files from our > repository system, I came across one file with a .wpd extension for which > Tika reported the following mimetype: > > {code} > application/x-quattro-pro; version=7-8 > {code} > Opening the file in LibreOffice reveals this is actually a WordPerfect > document (not sure about which version; the .WPD extension suggests WP 6 or > later). I had a look at the Quattro Pro entry in tika-mimetypes.xml: > {code} > <mime-type type="application/x-quattro-pro"> > <_comment> > Quattro Pro - Corel Spreadsheet (part of WordPerfect Office suite) > </_comment> > <!-- qp2 and wb3 are currently detected by POIFSContainerDetector > TODO: add detection for wb2 and wb1 --> > <glob pattern="*.qpw"/> > <glob pattern="*.wb1"/> > <glob pattern="*.wb2"/> > <glob pattern="*.wb3"/> > </mime-type> > {code} > This suggests that the problem originates from POIFSContainerDetector. > For legal reasons I cannot share the original file. However I was able to > create a derived file by truncating the original file after 18 kB, and this > derived file shows the same behaviour. The file is available at this link: > [tika-identified-as-quattro-pro-truncated.wpd|https://github.com/bitsgalore/shared/raw/master/tika-identified-as-quattro-pro-truncated.wpd] -- This message was sent by Atlassian JIRA (v6.4.14#64029)