Hi André, Yes, please, file an issue in JIRA and point at the mp3 file and the test case that failed. Thanks so much!
Cheers, Chris On 8/5/10 8:52 AM, "André Ricardo" <[email protected]> wrote: Hello, I was trying some mp3s in Tika coming from Nutch 0.9/1.0 samples and with "A corrupt MP3 file that has been truncated half way through the ID3v2 frames" returned this: $ java -jar tika-app-0.7.jar -v -m ~/nutch-0.9/src/plugin/parse-mp3/sample/test.mp3 Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.mp3.mp3par...@1bf3d87 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:138) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:169) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:62) Caused by: java.io.IOException: Tried to read 259186 bytes, but only 65526 bytes present at org.apache.tika.parser.mp3.ID3v2Frame.readFully(ID3v2Frame.java:160) at org.apache.tika.parser.mp3.ID3v2Frame.<init>(ID3v2Frame.java:110) at org.apache.tika.parser.mp3.ID3v2Frame.createFrameIfPresent(ID3v2Frame.java:81) at org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:128) at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:64) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132) ... 3 more Also tried with the latest trunk from github reproducing the problem: $ java -jar tika-app-0.8-SNAPSHOT.jar -v -m ~/nutch-0.9/src/plugin/parse-mp3/sample/test.mp3 Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.mp3.mp3par...@e79839 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:169) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:110) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:193) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:72) Caused by: java.io.IOException: Tried to read 259186 bytes, but only 65526 bytes present at org.apache.tika.parser.mp3.ID3v2Frame.readFully(ID3v2Frame.java:160) at org.apache.tika.parser.mp3.ID3v2Frame.<init>(ID3v2Frame.java:110) at org.apache.tika.parser.mp3.ID3v2Frame.createFrameIfPresent(ID3v2Frame.java:81) at org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:133) at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:64) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:163) ... 3 more The mp3 is here: http://github.com/apache/nutch/raw/tags/release-1.0/src/plugin/parse-mp3/sample/test.mp3 All the other mp3 samples were parsed well by Tika. Should I open an issue in Jira? And if so, would you consider this a bug or an improvement? André Ricardo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
