Y. Something is possibly more broken w the eval embedded file matcher. :( Will confirm I ran the latest version of tika-eval.
I’ll also check these files. I want to spot check new “missing attachment” files and then off we go? On Sun, Aug 20, 2023 at 7:21 AM Tilman Hausherr <[email protected]> wrote: > The content diffs are impressive, the "more in A" column is almost fully > empty. There are only 3 files that might be relevant: > > govdocs1/245/245359.doc _842859044.xls _842791279.doc > govdocs1/491/491561.ppt UNKNOWN-0.xls UNKNOWN-1.doc > govdocs1/752/752792.ppt UNKNOWN-0.xls UNKNOWN-1.doc > > > but I notice that the 2nd and 3rd column have different names. > > Tilman > > On 18.08.2023 00:21, Tim Allison wrote: > > Current reports are here: > > https://corpora.tika.apache.org/base/reports/tika-2.8.1-rand1m-xyz.tgz > > > > I expect a bunch of ole2 files will have fewer attachments because we're > no > > longer duplicating/triplicating macros. I haven't had a chance to look, > > but will look tomorrow. > > > > On Tue, Aug 15, 2023 at 11:29 AM Tim Allison<[email protected]> > wrote: > > > >> All, > >> > >> I'm back from vacation. I had really hoped to run this release before I > >> left, but TIKA-4091 and TIKA-4048 left some surprises without quick > fixes > >> available. > >> > >> I'd like to fix small regressions left behind in TIKA-4091 (case > >> insensitive object names in OLE2), the new TIKA-4116 (duplicate macros > in > >> some OLE2) and TIKA-4048 (the regression caused by setting extract all > in > >> compressor parsers). > >> > >> WIth those changes, I think we should increment the minor version -> > 2.9.0. > >> > >> Any blockers left for the next release? Any objections to the version > >> choice? > >> > >> > >> Best, > >> > >> Tim > >> > >> > >> >
