The mp3 regression is bad. In hindsight, the Tika-eval reports were fairly clear on this but I did some self-hand-waving to excuse away the numbers...I shouldn’t have.
I want to add some new reports to tika-eval so that this never happens again. How long should we wait for 1.19.1 or 1.20? Best, Tim On Wed, Sep 19, 2018 at 2:29 PM Hudson (JIRA) <j...@apache.org> wrote: > > [ > https://issues.apache.org/jira/browse/TIKA-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621008#comment-16621008 > ] > > Hudson commented on TIKA-2730: > ------------------------------ > > SUCCESS: Integrated in Jenkins build tika-branch-1x #94 (See [ > https://builds.apache.org/job/tika-branch-1x/94/]) > TIKA-2730 -- allow last frame to be truncated w/o throwing an EOF > (tallison: [ > https://github.com/apache/tika/commit/80cfd6d4a4270f8f3697c6dc083b3dedfc36c86a > ]) > * (edit) > tika-parsers/src/main/java/org/apache/tika/parser/mp3/MpegStream.java > * (edit) > tika-parsers/src/test/java/org/apache/tika/parser/mp3/Mp3ParserTest.java > * (add) > tika-parsers/src/test/resources/test-documents/testMP3i18n_truncated.mp3 > * (edit) > tika-parsers/src/main/java/org/apache/tika/parser/mp3/Mp3Parser.java > > > > parseToString fails for a simple mp3 > > ------------------------------------ > > > > Key: TIKA-2730 > > URL: https://issues.apache.org/jira/browse/TIKA-2730 > > Project: Tika > > Issue Type: Bug > > Affects Versions: 1.19 > > Reporter: Boris Petrov > > Assignee: Tim Allison > > Priority: Major > > Fix For: 2.0.0, 1.20 > > > > Attachments: demo.mp3 > > > > > > This is a regression from 1.18. I've attached the mp3 that fails. The > exception I get is: > > {noformat} > > org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException > from org.apache.tika.parser.mp3.Mp3Parser@cefe6c6 > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > > at org.apache.tika.Tika.parseToString(Tika.java:527) > > at com.company.TextExtractor.getText(TextExtractor.java:39) > > Caused by: > > java.io.EOFException: EOF: tried to skip 361 but could only skip 247 > > at > org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:166) > > at > org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:204) > > at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71) > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > > ... 5 more{noformat} > > > > -- > This message was sent by Atlassian JIRA > (v7.6.3#76005) >