>I'll dig into it, but I'm concerned
but I'm _not_ concerned...famous last words.  See above about
day/week/month/year...

This is caused by a diff in java versions.  This is not a problem at
the Tika level.  With Java 8, there's an EOF[0].  With Java 11,
there's no EOF.[1]  Not sure if this is a feature of Java 11 or worthy
of a bug report.

[0] openjdk version "1.8.0_292" OpenJDK Runtime Environment
(AdoptOpenJDK)(build 1.8.0_292-b10
[1] openjdk version "11.0.11" 2021-04-20 OpenJDK Runtime Environment
AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)

On Wed, Jun 30, 2021 at 5:01 PM Tim Allison <talli...@apache.org> wrote:
>
>
> Just one of those days, weeks, months, years.... Sorry... and thank you, Ken!
>
> AIFF, we're now getting more eofs than we were.  This might be a Java issue, 
> but I don't think there's anything to do at the Tika level.  I don't remember 
> any changes in the AudioParser in 1.27.  I'll dig into it, but I'm 
> concerned...famous last words...
>
> o.a.t.exception.TikaException
> at o.a.t.parser.CompositeParser.parse(CompositeParser.java:287)
> at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
> at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
> at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at o.a.t.parser.ParserDecorator.parse(ParserDecorator.java:188)
> at o.a.t.parser.DigestingParser.parse(DigestingParser.java:84)
> at o.a.t.parser.ParserDecorator.parse(ParserDecorator.java:188)
> at 
> o.a.t.parser.RecursiveParserWrapper$EmbeddedParserDecorator.parse(RecursiveParserWrapper.java:376)
> at o.a.t.parser.DelegatingParser.parse(DelegatingParser.java:72)
> at 
> o.a.t.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:104)
> at o.a.t.parser.pkg.RarParser.parse(RarParser.java:95)
> at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
> at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
> at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
> at o.a.t.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at o.a.t.parser.ParserDecorator.parse(ParserDecorator.java:188)
> at o.a.t.parser.DigestingParser.parse(DigestingParser.java:84)
> at o.a.t.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:239)
> at o.a.t.batch.FileResourceConsumer.parse(FileResourceConsumer.java:406)
> at 
> o.a.t.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:105)
> at 
> o.a.t.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:181)
> at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
> at o.a.t.batch.FileResourceConsumer.call(FileResourceConsumer.java:50)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java
> at com.sun.media.sound.AiffFileReader.getCOMM(AiffFileReader.java:267)
> at 
> com.sun.media.sound.AiffFileReader.getAudioFileFormat(AiffFileReader.java:76)
> at javax.sound.sampled.AudioSystem.getAudioFileFormat(AudioSystem.java:1004)
> at o.a.t.parser.audio.AudioParser.parse(AudioParser.java:73)
> at o.a.t.parser.CompositeParser.parse(CompositeParser.java:281)
> ... 28 more
>
>
> On Wed, Jun 30, 2021 at 4:37 PM Ken Krugler <kkrugler_li...@transpac.com> 
> wrote:
>>
>> Hi Tim,
>>
>> Don’t leave us hanging… :)
>>
>> — Ken
>>
>> On Jun 30, 2021, at 12:47 PM, Tim Allison <talli...@apache.org> wrote:
>>
>> There's an apparent change in mime detection: application/msword ->
>> application/pkcs7-signature and a few other file formats are now
>> apparently being detected as pkcs2-signature...
>>
>> This is an artifact of tika-eval and not a problem.  The issue is that
>> we used to parse files wrapped in pkcs7 sigs twice, and tika-eval
>> mailed to match up diff numbers of attachments.
>>
>> There may be a genuine new issue with
>>
>>
>> On Wed, Jun 30, 2021 at 3:06 PM Tim Allison <talli...@apache.org> wrote:
>>
>>
>> Reports are here:
>> https://corpora.tika.apache.org/base/reports/tika-1.27-pre-rc1-reports.tgz
>>
>> I've since fixed the MP4 issue.
>>
>> I'm running prepping 1.27-rc1 now.
>>
>> On Mon, Jun 28, 2021 at 3:56 PM Tim Allison <talli...@apache.org> wrote:
>>
>>
>> Updated dependencies that I could.  Kicking off regression tests now.
>> Onwards to 1.27!
>>
>> Cheers,
>>
>>         Tim
>>
>> On Mon, Jun 28, 2021 at 1:11 PM Nicholas DiPiazza
>> <nicholas.dipia...@gmail.com> wrote:
>>
>>
>> +1 on 1.27 release.
>>
>> On Mon, Jun 28, 2021, 10:57 AM Tim Allison <talli...@apache.org> wrote:
>>
>>
>> All,
>>  The recent release of PDFBox fixed 2 DoS CVEs.  Let's update our
>> dependencies and go for a 1.27 release soon?  Any blockers?  Any
>> strong prefs to go for a 2.0.0 or 2.0.0-BETA2 first?
>>
>>  Cheers,
>>
>>              Tim
>>
>>
>> --------------------------
>> Ken Krugler
>> http://www.scaleunlimited.com
>> Custom big data solutions
>> Flink, Pinot, Solr, Elasticsearch
>>
>>
>>

Reply via email to