[ https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734081#comment-14734081 ]
mungeol heo commented on TIKA-1728: ----------------------------------- I have tried r1701201, and it works properly. As far as I know, there are v3, v5, no v4 or other version. Therefore, I believe x-hwp-v3(old one which is x-hwp) and x-hwp-v5(new one) are better. FYI, I also tried java-hwp. It is able to detect both versions of HWP file and extract text from it. It works great. > Detection is not working properly for detecting HWP 5.0 file > ------------------------------------------------------------ > > Key: TIKA-1728 > URL: https://issues.apache.org/jira/browse/TIKA-1728 > Project: Tika > Issue Type: Bug > Environment: OS: windows 7 and centos 6 > Java: 1.7 > Tika jar: tika-app-1.10.jar > File: HWP 5.0 > Reporter: mungeol heo > Attachments: HWP-document-file-formats-3.0-Korean.pdf, > HWP-document-file-formats-5.0-Korean.pdf, error-message.png, test_3.0.hwp, > test_5.0.hwp > > > HWP file has two formats which are HWP 3.0 and HWP 5.0. > 'tika-app-1.10.jar' detects HWP 3.0 format's file correctly. > But, not for HWP 5.0. > Used commands and returned results are addresses below. > > java -jar tika-app-1.10.jar --detect test_3.0.hwp > > application/x-hwp > > java -jar tika-app-1.10.jar --detect test_5.0.hwp > > application/x-tika-msoffice -- This message was sent by Atlassian JIRA (v6.3.4#6332)