[
https://issues.apache.org/jira/browse/TIKA-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364890#comment-17364890
]
Tim Allison edited comment on TIKA-3449 at 6/17/21, 4:30 PM:
-------------------------------------------------------------
diffs in date -- created and modified -- the legacy parser prefers the "track
create date" and the "track modify date" to the "media create date" and the
"media modified date".
When audio sample rate differs: the newer parser agrees with exiftool, and I
can't see in any of exiftool's output the value for the older parser. In
short, I think the new parser fixes a bug in the legacy parser.
Legacy parser included length and width metadata for audio, with value of zero.
I added a number of fields in the newer parser including subject, description,
copyright and a few others. So there's more metadata than in the legacy
parser. Still on the todo list is to parse the embedded album cover image
file. :D
The only other difference I'm now seeing is that if the apple user data box is
truncated, then the new parser is extracting no information, whereas the legacy
parser tried to extract as much as it could. If we need to, we can write our
own box iterator to try to scrape as much as we can, but I don't think most
users will see this often. Please open a ticket if there's a need for this.
There were a few cases where the new parser was able to extract information
that the legacy parser wasn't... I think it depends on where the file was
truncated.
Most of the mp4s in our corpora are truncated. In the new parser, I added a
parse-warn key in the metadata, and that info is now being stored there. We're
no longer throwing EOF if the mp4 is truncated, which is great because mp4s can
stop short and this was an ongoing annoyance with the legacy parser. :D
was (Author: [email protected]):
xmpDM:audioSampleRate differ sometimes. The newer parser and ExifTool agree on
the handful that I've manually checked.
> Remove sannies mp4 isoparser from Tika 2.x
> ------------------------------------------
>
> Key: TIKA-3449
> URL: https://issues.apache.org/jira/browse/TIKA-3449
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> If we can prove equality or improvement in Drew Noakes' metadata-extractor's
> MP4Parser over the no longer supported sannies' Mp4Parser, we should remove
> sannies in 2.x.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)