>I'll dig into it, but I'm concerned but I'm _not_ concerned...famous last words. See above about day/week/month/year...
This is caused by a diff in java versions. This is not a problem at the Tika level. With Java 8, there's an EOF[0]. With Java 11, there's no EOF.[1] Not sure if this is a feature of Java 11 or worthy of a bug report. [0] openjdk version "1.8.0_292" OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10 [1] openjdk version "11.0.11" 2021-04-20 OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) On Wed, Jun 30, 2021 at 5:01 PM Tim Allison <talli...@apache.org> wrote: > > > Just one of those days, weeks, months, years.... Sorry... and thank you, Ken! > > AIFF, we're now getting more eofs than we were. This might be a Java issue, > but I don't think there's anything to do at the Tika level. I don't remember > any changes in the AudioParser in 1.27. I'll dig into it, but I'm > concerned...famous last words... > > o.a.t.exception.TikaException > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:287) > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281) > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281) > at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > at o.a.t.parser.ParserDecorator.parse(ParserDecorator.java:188) > at o.a.t.parser.DigestingParser.parse(DigestingParser.java:84) > at o.a.t.parser.ParserDecorator.parse(ParserDecorator.java:188) > at > o.a.t.parser.RecursiveParserWrapper$EmbeddedParserDecorator.parse(RecursiveParserWrapper.java:376) > at o.a.t.parser.DelegatingParser.parse(DelegatingParser.java:72) > at > o.a.t.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:104) > at o.a.t.parser.pkg.RarParser.parse(RarParser.java:95) > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281) > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281) > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281) > at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > at o.a.t.parser.ParserDecorator.parse(ParserDecorator.java:188) > at o.a.t.parser.DigestingParser.parse(DigestingParser.java:84) > at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:239) > at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:406) > at > o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:105) > at > o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:181) > at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115) > at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:50) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java > at com.sun.media.sound.AiffFileReader.getCOMM(AiffFileReader.java:267) > at > com.sun.media.sound.AiffFileReader.getAudioFileFormat(AiffFileReader.java:76) > at javax.sound.sampled.AudioSystem.getAudioFileFormat(AudioSystem.java:1004) > at o.a.t.parser.audio.AudioParser.parse(AudioParser.java:73) > at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281) > ... 28 more > > > On Wed, Jun 30, 2021 at 4:37 PM Ken Krugler <kkrugler_li...@transpac.com> > wrote: >> >> Hi Tim, >> >> Don’t leave us hanging… :) >> >> — Ken >> >> On Jun 30, 2021, at 12:47 PM, Tim Allison <talli...@apache.org> wrote: >> >> There's an apparent change in mime detection: application/msword -> >> application/pkcs7-signature and a few other file formats are now >> apparently being detected as pkcs2-signature... >> >> This is an artifact of tika-eval and not a problem. The issue is that >> we used to parse files wrapped in pkcs7 sigs twice, and tika-eval >> mailed to match up diff numbers of attachments. >> >> There may be a genuine new issue with >> >> >> On Wed, Jun 30, 2021 at 3:06 PM Tim Allison <talli...@apache.org> wrote: >> >> >> Reports are here: >> https://corpora.tika.apache.org/base/reports/tika-1.27-pre-rc1-reports.tgz >> >> I've since fixed the MP4 issue. >> >> I'm running prepping 1.27-rc1 now. >> >> On Mon, Jun 28, 2021 at 3:56 PM Tim Allison <talli...@apache.org> wrote: >> >> >> Updated dependencies that I could. Kicking off regression tests now. >> Onwards to 1.27! >> >> Cheers, >> >> Tim >> >> On Mon, Jun 28, 2021 at 1:11 PM Nicholas DiPiazza >> <nicholas.dipia...@gmail.com> wrote: >> >> >> +1 on 1.27 release. >> >> On Mon, Jun 28, 2021, 10:57 AM Tim Allison <talli...@apache.org> wrote: >> >> >> All, >> The recent release of PDFBox fixed 2 DoS CVEs. Let's update our >> dependencies and go for a 1.27 release soon? Any blockers? Any >> strong prefs to go for a 2.0.0 or 2.0.0-BETA2 first? >> >> Cheers, >> >> Tim >> >> >> -------------------------- >> Ken Krugler >> http://www.scaleunlimited.com >> Custom big data solutions >> Flink, Pinot, Solr, Elasticsearch >> >> >>