My belated +1. Reports are here: https://corpora.tika.apache.org/base/reports/tika-2.9.1-rc1-reports.tgz
The major diff is in rfc822 detection. I can see a few regressions, but most of the changes are improvements. The "common tokens" are going down in a bunch of files that are now correctly identified as html...the sign is that "unique to A" looks like: class: 416 | div: 380 | data: 200 | message: 184 | span: 134 | href: 109 | js: 90 | p: 80 | time: 60 | u: 60 On Wed, Oct 18, 2023 at 9:43 AM Oleg Tikhonov <[email protected]> wrote: > +1 > Jdk 8 and 11, ubuntu 20 > > > On Tue, 17 Oct 2023 at 21:05 Tilman Hausherr <[email protected]> > wrote: > > > +1 > > > > successful build on german windows on jdk 11.0.20 > > > > Tilman > > > > On 17.10.2023 13:13, Tim Allison wrote: > > > A candidate for the Tika 2.9.1 release is available at: > > > https://dist.apache.org/repos/dist/dev/tika/2.9.1 > > > > > > The release candidate is a zip archive of the sources in: > > > https://github.com/apache/tika/tree/2.9.1-rc1 > > > > > > The SHA-512 checksum of the archive is > > > > > > ba13a0d22994ca84cccd9ad2931e099051870d46a5a3440258f93bd63f6e3b03de51709c51cf0e4029e57ba9c44cdb243ac440d76e695dfc081dfd9d956d8777. > > > > > > In addition, a staged maven repository is available here: > > > > > > https://repository.apache.org/content/repositories/orgapachetika-1096/org/apache/tika > > > > > > Please vote on releasing this package as Apache Tika 2.9.1. > > > The vote is open for the next 72 hours and passes if a majority of at > > > least three +1 Tika PMC votes are cast. > > > > > > [ ] +1 Release this package as Apache Tika 2.9.1 > > > [ ] -1 Do not release this package because... > > > > > > Best, > > > Tim > > > > > > > >
