[
https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728863#comment-14728863
]
Nick Burch commented on TIKA-1728:
----------------------------------
The issues is that the v3 files (and earlier?) are in their own wrapper, while
the v5 (and later?) ones are stored within an OLE2 structure
As of r1700986, the v3 files continue to be detected as {{application/x-hwp}},
while the v5 ones are now detected as {{application/x-hwp-v5}}
It'd be helpful if someone could confirm what the very latest file type is, so
we can decide if that v5 on the mimetype is a suitable name, or if we should
make it more general
> Detection is not working properly for detecting HWP 5.0 file
> ------------------------------------------------------------
>
> Key: TIKA-1728
> URL: https://issues.apache.org/jira/browse/TIKA-1728
> Project: Tika
> Issue Type: Bug
> Environment: OS: windows 7 and centos 6
> Java: 1.7
> Tika jar: tika-app-1.10.jar
> File: HWP 5.0
> Reporter: mungeol heo
> Attachments: HWP-document-file-formats-3.0-Korean.pdf,
> HWP-document-file-formats-5.0-Korean.pdf, test_3.0.hwp, test_5.0.hwp
>
>
> HWP file has two formats which are HWP 3.0 and HWP 5.0.
> 'tika-app-1.10.jar' detects HWP 3.0 format's file correctly.
> But, not for HWP 5.0.
> Used commands and returned results are addresses below.
> > java -jar tika-app-1.10.jar --detect test_3.0.hwp
> > application/x-hwp
> > java -jar tika-app-1.10.jar --detect test_5.0.hwp
> > application/x-tika-msoffice
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)