Sameer Apte created TIKA-3128:
---------------------------------

             Summary: MOV file produces RuntimeException with 1.24.1, used to 
work with earlier version
                 Key: TIKA-3128
                 URL: https://issues.apache.org/jira/browse/TIKA-3128
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.24.1
            Reporter: Sameer Apte
         Attachments: HDSIT_157516.mov

Attached _mov_ file produces _RuntimeException_ when parsed with *tika v1.24.1*

The same _mov_ file can be parsed without any issues with *tika v1.19.1*

 *Tika 1.19.1 stand alone app _SUCCESSFUL_ run*
{code:java}
[sapte@sapte-dt tikatest]$ java -jar tika-app-1.19.1.jar -m HDSIT_157516.mov
Jun 18, 2020 11:25:00 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.Jun 18, 2020 11:25:00 AM 
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
Content-Length: 51066400
Content-Type: application/mp4
Creation-Date: 2015-05-18T16:23:25Z
Last-Modified: 2015-05-18T16:31:09Z
Last-Save-Date: 2015-05-18T16:31:09Z
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.mp4.MP4Parser
date: 2015-05-18T16:31:09Z
dcterms:created: 2015-05-18T16:23:25Z
dcterms:modified: 2015-05-18T16:31:09Z
meta:creation-date: 2015-05-18T16:23:25Z
meta:save-date: 2015-05-18T16:31:09Z
modified: 2015-05-18T16:31:09Z
resourceName: HDSIT_157516.mov
tiff:ImageLength: 1080
tiff:ImageWidth: 1920
xmpDM:audioSampleRate: 30000
xmpDM:duration: 125.99
 {code}
*Tika 1.24.1 standalone app _RUNTIMEEXCEPTION_ run*
{code:java}
[sapte@sapte-dt tikatest]$ java -jar tika-app-1.24.1.jar -m HDSIT_157516.mov
Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.

Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected 
RuntimeException from org.apache.tika.parser.mp4.MP4Parser@23348b5d
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
        at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)
Caused by: java.lang.RuntimeException: box size of zero means 'till end of 
file. That is not yet supported
        at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:90)
        at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
        at 
org.mp4parser.boxes.sampleentry.VisualSampleEntry.parse(VisualSampleEntry.java:195)
        at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
        at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
        at 
org.mp4parser.boxes.iso14496.part12.SampleDescriptionBox.parse(SampleDescriptionBox.java:91)
        at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
        at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
        at 
org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
        at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
        at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
        at 
org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
        at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
        at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
        at 
org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
        at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
        at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
        at 
org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
        at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
        at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
        at 
org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
        at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
        at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
        at org.mp4parser.IsoFile.<init>(IsoFile.java:58)
        at org.mp4parser.IsoFile.<init>(IsoFile.java:45)
        at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:130)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        ... 5 more
{code}
Commit _8e2eb05292bc35503a3d82a908c426854e23ac83_ in v1.24.1 which switched the 
mp4 parser from _googlecode_ to _tallison_ appears to be directly responsible 
for the change in behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to