Sameer Apte created TIKA-3128: --------------------------------- Summary: MOV file produces RuntimeException with 1.24.1, used to work with earlier version Key: TIKA-3128 URL: https://issues.apache.org/jira/browse/TIKA-3128 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.24.1 Reporter: Sameer Apte Attachments: HDSIT_157516.mov
Attached _mov_ file produces _RuntimeException_ when parsed with *tika v1.24.1* The same _mov_ file can be parsed without any issues with *tika v1.19.1* *Tika 1.19.1 stand alone app _SUCCESSFUL_ run* {code:java} [sapte@sapte-dt tikatest]$ java -jar tika-app-1.19.1.jar -m HDSIT_157516.mov Jun 18, 2020 11:25:00 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies.Jun 18, 2020 11:25:00 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. Content-Length: 51066400 Content-Type: application/mp4 Creation-Date: 2015-05-18T16:23:25Z Last-Modified: 2015-05-18T16:31:09Z Last-Save-Date: 2015-05-18T16:31:09Z X-Parsed-By: org.apache.tika.parser.DefaultParser X-Parsed-By: org.apache.tika.parser.mp4.MP4Parser date: 2015-05-18T16:31:09Z dcterms:created: 2015-05-18T16:23:25Z dcterms:modified: 2015-05-18T16:31:09Z meta:creation-date: 2015-05-18T16:23:25Z meta:save-date: 2015-05-18T16:31:09Z modified: 2015-05-18T16:31:09Z resourceName: HDSIT_157516.mov tiff:ImageLength: 1080 tiff:ImageWidth: 1920 xmpDM:audioSampleRate: 30000 xmpDM:duration: 125.99 {code} *Tika 1.24.1 standalone app _RUNTIMEEXCEPTION_ run* {code:java} [sapte@sapte-dt tikatest]$ java -jar tika-app-1.24.1.jar -m HDSIT_157516.mov Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.mp4.MP4Parser@23348b5d at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149) Caused by: java.lang.RuntimeException: box size of zero means 'till end of file. That is not yet supported at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:90) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.boxes.sampleentry.VisualSampleEntry.parse(VisualSampleEntry.java:195) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.boxes.iso14496.part12.SampleDescriptionBox.parse(SampleDescriptionBox.java:91) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76) at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) at org.mp4parser.IsoFile.<init>(IsoFile.java:58) at org.mp4parser.IsoFile.<init>(IsoFile.java:45) at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:130) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 5 more {code} Commit _8e2eb05292bc35503a3d82a908c426854e23ac83_ in v1.24.1 which switched the mp4 parser from _googlecode_ to _tallison_ appears to be directly responsible for the change in behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)