My belated +1. Reports are here:
https://corpora.tika.apache.org/base/reports/tika-2.9.1-rc1-reports.tgz

The major diff is in rfc822 detection. I can see a few regressions, but
most of the changes are improvements. The "common tokens" are going down in
a bunch of files that are now correctly identified as html...the sign is
that "unique to A" looks like: class: 416 | div: 380 | data: 200 | message:
184 | span: 134 | href: 109 | js: 90 | p: 80 | time: 60 | u: 60

On Wed, Oct 18, 2023 at 9:43 AM Oleg Tikhonov <[email protected]> wrote:

> +1
> Jdk 8 and 11, ubuntu 20
>
>
> On Tue, 17 Oct 2023 at 21:05 Tilman Hausherr <[email protected]>
> wrote:
>
> > +1
> >
> > successful build on german windows on jdk 11.0.20
> >
> > Tilman
> >
> > On 17.10.2023 13:13, Tim Allison wrote:
> > > A candidate for the Tika 2.9.1 release is available at:
> > > https://dist.apache.org/repos/dist/dev/tika/2.9.1
> > >
> > > The release candidate is a zip archive of the sources in:
> > > https://github.com/apache/tika/tree/2.9.1-rc1
> > >
> > > The SHA-512 checksum of the archive is
> > >
> >
> ba13a0d22994ca84cccd9ad2931e099051870d46a5a3440258f93bd63f6e3b03de51709c51cf0e4029e57ba9c44cdb243ac440d76e695dfc081dfd9d956d8777.
> > >
> > > In addition, a staged maven repository is available here:
> > >
> >
> https://repository.apache.org/content/repositories/orgapachetika-1096/org/apache/tika
> > >
> > > Please vote on releasing this package as Apache Tika 2.9.1.
> > > The vote is open for the next 72 hours and passes if a majority of at
> > > least three +1 Tika PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache Tika 2.9.1
> > > [ ] -1 Do not release this package because...
> > >
> > > Best,
> > >           Tim
> > >
> >
> >
>

Reply via email to