[
https://issues.apache.org/jira/browse/TIKA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475319#comment-17475319
]
Tim Allison commented on TIKA-3647:
-----------------------------------
Are you missing all metadata or just some fields? We made breaking changes in
2.x to streamline the metadata keys. If only missing some fields, see the
Metadata section:
https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0
In tika 1.28 with tika-app:
{noformat}
java -jar ~/tools/tika/tika-app-1.28.jar ~/Downloads/test.hwp
Jan 13, 2022 7:34:39 AM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.Jan 13, 2022 7:34:39 AM
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="date" content="2004-07-28T21:00:52Z"/>
<meta name="dc:creator" content=""/>
<meta name="dcterms:created" content="2004-07-28T20:54:47Z"/>
<meta name="dcterms:modified" content="2004-07-28T21:00:52Z"/>
<meta name="Last-Modified" content="2004-07-28T21:00:52Z"/>
<meta name="Last-Save-Date" content="2004-07-28T21:00:52Z"/>
<meta name="meta:save-date" content="2004-07-28T21:00:52Z"/>
<meta name="dc:title" content=""/>
<meta name="modified" content="2004-07-28T21:00:52Z"/>
<meta name="cp:subject" content=""/>
<meta name="Content-Length" content="16896"/>
<meta name="Content-Type" content="application/x-hwp-v5"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.hwp.HwpV5Parser"/>
<meta name="creator" content=""/>
<meta name="meta:author" content=""/>
<meta name="meta:creation-date" content="2004-07-28T20:54:47Z"/>
<meta name="Comments" content=""/>
<meta name="meta:last-author" content="Administrator"/>
<meta name="Creation-Date" content="2004-07-28T20:54:47Z"/>
<meta name="resourceName" content="test.hwp"/>
<meta name="w:comments" content=""/>
<meta name="Last-Author" content="Administrator"/>
<meta name="meta:keyword" content=""/>
<meta name="Author" content=""/>
<meta name="comment" content=""/>
<title/>
{noformat}
In 2.2.1, I get this:
{noformat}
java -jar ~/tools/tika/tika-app-2.2.1.jar ~/Downloads/test.hwp
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="w:Comments" content=""/>
<meta name="dc:subject" content=""/>
<meta name="meta:last-author" content="Administrator"/>
<meta name="dc:creator" content=""/>
<meta name="resourceName" content="test.hwp"/>
<meta name="dcterms:created" content="2004-07-28T20:54:47Z"/>
<meta name="dcterms:modified" content="2004-07-28T21:00:52Z"/>
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.hwp.HwpV5Parser"/>
<meta name="dc:title" content=""/>
<meta name="meta:keyword" content=""/>
<meta name="cp:subject" content=""/>
<meta name="Content-Length" content="16896"/>
<meta name="Content-Type" content="application/x-hwp-v5"/>
<title/>
... {noformat}
> Failed to get content and metadata for .hwp files
> -------------------------------------------------
>
> Key: TIKA-3647
> URL: https://issues.apache.org/jira/browse/TIKA-3647
> Project: Tika
> Issue Type: Bug
> Reporter: Tika User
> Priority: Blocker
> Attachments: test.hwp
>
>
> Wen trying to parse .hwp file no metadata is returning. This is working fine
> in the 1.27 version of tika. currently, we are using the 2.2.1 version now.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)