[ 
https://issues.apache.org/jira/browse/TIKA-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528292#comment-17528292
 ] 

Tim Allison commented on TIKA-3738:
-----------------------------------

This issue appears to go all the way back to the original ForkParser. To 
confirm, this isn't a new issue, right?

The metadata that comes through is what is written into the xhtml.

For example, if we use a regular parser we get a bunch more metadata in the 
metadata object than we do in the xhtml:
{noformat}
<head>
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.mp4.MP4Parser" />
<meta name="dc:title" content="Test Title" />
<meta name="Content-Type" content="audio/mp4" />
<title>Test Title</title>
</head>
{noformat}

Metadata:
{noformat}
Content-Type : audio/mp4
X-TIKA:EXCEPTION:warn : End of data reached.
X-TIKA:Parsed-By : org.apache.tika.parser.DefaultParser
X-TIKA:Parsed-By : org.apache.tika.parser.mp4.MP4Parser
X-TIKA:Parsed-By-Full-Set : org.apache.tika.parser.DefaultParser
X-TIKA:Parsed-By-Full-Set : org.apache.tika.parser.mp4.MP4Parser
dc:creator : Test Artist
dc:title : Test Title
dcterms:created : 2012-01-28T18:39:18Z
dcterms:modified : 2012-01-28T18:40:25Z
xmp:CreatorTool : iTunes 10.5.3.3
xmpDM:album : Test Album
xmpDM:albumArtist : Test Album Artist
xmpDM:artist : Test Artist
xmpDM:audioChannelType : Stereo
xmpDM:audioCompressor : M4A
xmpDM:audioSampleRate : 44100
xmpDM:compilation : 0
xmpDM:composer : Test Composer
xmpDM:discNumber : 6
xmpDM:duration : 0.07
xmpDM:genre : Test Genre
xmpDM:logComment : Test Comments
xmpDM:releaseDate : 2008
xmpDM:trackNumber : 1
{noformat}

I haven't looked at this part of the codebase in a while, and I'm frankly 
trying to figure out how any metadata comes back.

Will update when I figure that out. :D

> ForkParser missing metadata for some document formats
> -----------------------------------------------------
>
>                 Key: TIKA-3738
>                 URL: https://issues.apache.org/jira/browse/TIKA-3738
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.3.0
>         Environment: Java 11.0.14.
>            Reporter: Stephen H
>            Priority: Major
>         Attachments: ForkParserIntegrationTest.java.diff, 
> testVideoMetadataMp4.mp4
>
>
> When using ForkParser, metadata from some parsers is not being returned in 
> the Metadata object or in the head of the returned XML. These include 
> OpenDocument Presentation (ODP), OpenDocument Spreadsheet (ODS), Microsoft 
> Word 2006 XML, MP4 Audio (M4A) and MP4 Video (MP4).
> Patch for ForkParserIntegrationTest showing the issue for these file types is 
> attached, along with an MP4 video file containing metadata as there doesn't 
> appear to be one currently in the test set.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to