Sameer Apte created TIKA-3128:
---------------------------------
Summary: MOV file produces RuntimeException with 1.24.1, used to
work with earlier version
Key: TIKA-3128
URL: https://issues.apache.org/jira/browse/TIKA-3128
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.24.1
Reporter: Sameer Apte
Attachments: HDSIT_157516.mov
Attached _mov_ file produces _RuntimeException_ when parsed with *tika v1.24.1*
The same _mov_ file can be parsed without any issues with *tika v1.19.1*
*Tika 1.19.1 stand alone app _SUCCESSFUL_ run*
{code:java}
[sapte@sapte-dt tikatest]$ java -jar tika-app-1.19.1.jar -m HDSIT_157516.mov
Jun 18, 2020 11:25:00 AM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.Jun 18, 2020 11:25:00 AM
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
Content-Length: 51066400
Content-Type: application/mp4
Creation-Date: 2015-05-18T16:23:25Z
Last-Modified: 2015-05-18T16:31:09Z
Last-Save-Date: 2015-05-18T16:31:09Z
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.mp4.MP4Parser
date: 2015-05-18T16:31:09Z
dcterms:created: 2015-05-18T16:23:25Z
dcterms:modified: 2015-05-18T16:31:09Z
meta:creation-date: 2015-05-18T16:23:25Z
meta:save-date: 2015-05-18T16:31:09Z
modified: 2015-05-18T16:31:09Z
resourceName: HDSIT_157516.mov
tiff:ImageLength: 1080
tiff:ImageWidth: 1920
xmpDM:audioSampleRate: 30000
xmpDM:duration: 125.99
{code}
*Tika 1.24.1 standalone app _RUNTIMEEXCEPTION_ run*
{code:java}
[sapte@sapte-dt tikatest]$ java -jar tika-app-1.24.1.jar -m HDSIT_157516.mov
Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.mp4.MP4Parser@23348b5d
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)
Caused by: java.lang.RuntimeException: box size of zero means 'till end of
file. That is not yet supported
at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:90)
at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
at
org.mp4parser.boxes.sampleentry.VisualSampleEntry.parse(VisualSampleEntry.java:195)
at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
at
org.mp4parser.boxes.iso14496.part12.SampleDescriptionBox.parse(SampleDescriptionBox.java:91)
at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
at
org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
at
org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
at
org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
at
org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
at
org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
at org.mp4parser.IsoFile.<init>(IsoFile.java:58)
at org.mp4parser.IsoFile.<init>(IsoFile.java:45)
at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:130)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 5 more
{code}
Commit _8e2eb05292bc35503a3d82a908c426854e23ac83_ in v1.24.1 which switched the
mp4 parser from _googlecode_ to _tallison_ appears to be directly responsible
for the change in behavior.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)