[ 
https://issues.apache.org/jira/browse/TIKA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475319#comment-17475319
 ] 

Tim Allison commented on TIKA-3647:
-----------------------------------

Are you missing all metadata or just some fields?  We made breaking changes in 
2.x to streamline the metadata keys.  If only missing some fields, see the 
Metadata section: 
https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0

 

In tika 1.28 with tika-app:

 
{noformat}
java -jar ~/tools/tika/tika-app-1.28.jar ~/Downloads/test.hwp 
Jan 13, 2022 7:34:39 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.Jan 13, 2022 7:34:39 AM 
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
<?xml version="1.0" encoding="UTF-8"?><html 
xmlns="http://www.w3.org/1999/xhtml";>
<head>
<meta name="date" content="2004-07-28T21:00:52Z"/>
<meta name="dc:creator" content=""/>
<meta name="dcterms:created" content="2004-07-28T20:54:47Z"/>
<meta name="dcterms:modified" content="2004-07-28T21:00:52Z"/>
<meta name="Last-Modified" content="2004-07-28T21:00:52Z"/>
<meta name="Last-Save-Date" content="2004-07-28T21:00:52Z"/>
<meta name="meta:save-date" content="2004-07-28T21:00:52Z"/>
<meta name="dc:title" content=""/>
<meta name="modified" content="2004-07-28T21:00:52Z"/>
<meta name="cp:subject" content=""/>
<meta name="Content-Length" content="16896"/>
<meta name="Content-Type" content="application/x-hwp-v5"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.hwp.HwpV5Parser"/>
<meta name="creator" content=""/>
<meta name="meta:author" content=""/>
<meta name="meta:creation-date" content="2004-07-28T20:54:47Z"/>
<meta name="Comments" content=""/>
<meta name="meta:last-author" content="Administrator"/>
<meta name="Creation-Date" content="2004-07-28T20:54:47Z"/>
<meta name="resourceName" content="test.hwp"/>
<meta name="w:comments" content=""/>
<meta name="Last-Author" content="Administrator"/>
<meta name="meta:keyword" content=""/>
<meta name="Author" content=""/>
<meta name="comment" content=""/>
<title/>
 {noformat}
In 2.2.1, I get this:
{noformat}

java -jar ~/tools/tika/tika-app-2.2.1.jar ~/Downloads/test.hwp 
<?xml version="1.0" encoding="UTF-8"?><html 
xmlns="http://www.w3.org/1999/xhtml";>
<head>
<meta name="w:Comments" content=""/>
<meta name="dc:subject" content=""/>
<meta name="meta:last-author" content="Administrator"/>
<meta name="dc:creator" content=""/>
<meta name="resourceName" content="test.hwp"/>
<meta name="dcterms:created" content="2004-07-28T20:54:47Z"/>
<meta name="dcterms:modified" content="2004-07-28T21:00:52Z"/>
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.hwp.HwpV5Parser"/>
<meta name="dc:title" content=""/>
<meta name="meta:keyword" content=""/>
<meta name="cp:subject" content=""/>
<meta name="Content-Length" content="16896"/>
<meta name="Content-Type" content="application/x-hwp-v5"/>
<title/>
... {noformat}

> Failed to get content and metadata for .hwp files
> -------------------------------------------------
>
>                 Key: TIKA-3647
>                 URL: https://issues.apache.org/jira/browse/TIKA-3647
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tika User
>            Priority: Blocker
>         Attachments: test.hwp
>
>
> Wen trying to parse .hwp file no metadata is returning. This is working fine 
> in the 1.27 version of tika. currently, we are using the 2.2.1 version now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to