In Java 11, the AIFFReader swallows the EOF and throws an UnsupportedAudioFileException.
We have this: } catch (UnsupportedAudioFileException e) { // There is no way to know whether this exception was // caused by the document being corrupted or by the format // just being unsupported. So we do nothing. In main, I've added a warning message in the metadata using the TikaCoreProperties.TIKA_META_EXCEPTION_WARNING key. This is not a Tika problem, 1.27 or otherwise. :D Onwards! On Thu, Jul 1, 2021 at 11:30 AM Tim Allison <talli...@apache.org> wrote: > > >I'll dig into it, but I'm concerned > but I'm _not_ concerned...famous last words. See above about > day/week/month/year... > > This is caused by a diff in java versions. This is not a problem at > the Tika level. With Java 8, there's an EOF[0]. With Java 11, > there's no EOF.[1] Not sure if this is a feature of Java 11 or worthy > of a bug report. > > [0] openjdk version "1.8.0_292" OpenJDK Runtime Environment > (AdoptOpenJDK)(build 1.8.0_292-b10 > [1] openjdk version "11.0.11" 2021-04-20 OpenJDK Runtime Environment > AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) > > On Wed, Jun 30, 2021 at 5:01 PM Tim Allison <talli...@apache.org> wrote: > > > > > > Just one of those days, weeks, months, years.... Sorry... and thank you, > > Ken! > > > > AIFF, we're now getting more eofs than we were. This might be a Java > > issue, but I don't think there's anything to do at the Tika level. I don't > > remember any changes in the AudioParser in 1.27. I'll dig into it, but I'm > > concerned...famous last words... > > > > o.a.t.exception.TikaException > > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:287) > > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281) > > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281) > > at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > > at o.a.t.parser.ParserDecorator.parse(ParserDecorator.java:188) > > at o.a.t.parser.DigestingParser.parse(DigestingParser.java:84) > > at o.a.t.parser.ParserDecorator.parse(ParserDecorator.java:188) > > at > > o.a.t.parser.RecursiveParserWrapper$EmbeddedParserDecorator.parse(RecursiveParserWrapper.java:376) > > at o.a.t.parser.DelegatingParser.parse(DelegatingParser.java:72) > > at > > o.a.t.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:104) > > at o.a.t.parser.pkg.RarParser.parse(RarParser.java:95) > > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281) > > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281) > > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281) > > at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > > at o.a.t.parser.ParserDecorator.parse(ParserDecorator.java:188) > > at o.a.t.parser.DigestingParser.parse(DigestingParser.java:84) > > at > > o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:239) > > at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:406) > > at > > o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:105) > > at > > o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:181) > > at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115) > > at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:50) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: java.io.EOFException > > at java.io.DataInputStream.readInt(DataInputStream.java > > at com.sun.media.sound.AiffFileReader.getCOMM(AiffFileReader.java:267) > > at > > com.sun.media.sound.AiffFileReader.getAudioFileFormat(AiffFileReader.java:76) > > at javax.sound.sampled.AudioSystem.getAudioFileFormat(AudioSystem.java:1004) > > at o.a.t.parser.audio.AudioParser.parse(AudioParser.java:73) > > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281) > > ... 28 more > > > > > > On Wed, Jun 30, 2021 at 4:37 PM Ken Krugler <kkrugler_li...@transpac.com> > > wrote: > >> > >> Hi Tim, > >> > >> Don’t leave us hanging… :) > >> > >> — Ken > >> > >> On Jun 30, 2021, at 12:47 PM, Tim Allison <talli...@apache.org> wrote: > >> > >> There's an apparent change in mime detection: application/msword -> > >> application/pkcs7-signature and a few other file formats are now > >> apparently being detected as pkcs2-signature... > >> > >> This is an artifact of tika-eval and not a problem. The issue is that > >> we used to parse files wrapped in pkcs7 sigs twice, and tika-eval > >> mailed to match up diff numbers of attachments. > >> > >> There may be a genuine new issue with > >> > >> > >> On Wed, Jun 30, 2021 at 3:06 PM Tim Allison <talli...@apache.org> wrote: > >> > >> > >> Reports are here: > >> https://corpora.tika.apache.org/base/reports/tika-1.27-pre-rc1-reports.tgz > >> > >> I've since fixed the MP4 issue. > >> > >> I'm running prepping 1.27-rc1 now. > >> > >> On Mon, Jun 28, 2021 at 3:56 PM Tim Allison <talli...@apache.org> wrote: > >> > >> > >> Updated dependencies that I could. Kicking off regression tests now. > >> Onwards to 1.27! > >> > >> Cheers, > >> > >> Tim > >> > >> On Mon, Jun 28, 2021 at 1:11 PM Nicholas DiPiazza > >> <nicholas.dipia...@gmail.com> wrote: > >> > >> > >> +1 on 1.27 release. > >> > >> On Mon, Jun 28, 2021, 10:57 AM Tim Allison <talli...@apache.org> wrote: > >> > >> > >> All, > >> The recent release of PDFBox fixed 2 DoS CVEs. Let's update our > >> dependencies and go for a 1.27 release soon? Any blockers? Any > >> strong prefs to go for a 2.0.0 or 2.0.0-BETA2 first? > >> > >> Cheers, > >> > >> Tim > >> > >> > >> -------------------------- > >> Ken Krugler > >> http://www.scaleunlimited.com > >> Custom big data solutions > >> Flink, Pinot, Solr, Elasticsearch > >> > >> > >>