Reports are here:
https://corpora.tika.apache.org/base/reports/tika-2.3-vs-2.4-pdfs.tgz

It looks like no significant changes.  Some diffs on a few files, but
this was run on ~800k PDFs.

There are a couple of cases where a file is now being detected as
rfc822 instead of PDF.  We have to fix that on the Tika side.

On Mon, Mar 21, 2022 at 12:53 PM Andreas Lehmkuehler <andr...@lehmi.de> wrote:
>
>
> Am 21.03.22 um 12:21 schrieb Tim Allison:
> > I'm happy to run the tests today if that would be of any interest.
> Yes, please.
>
> TIA
> Andreas
>
>
> >
> > On Sun, Mar 20, 2022 at 5:01 PM Andreas Lehmkuehler <andr...@lehmi.de> 
> > wrote:
> >>
> >> Am 13.03.22 um 14:20 schrieb Tim Allison:
> >>>   From Tika's perspective, there's no rush. We're waiting for a bug fix
> >>> in POI (TIKA-3699).
> >>>
> >>> Please let me know if/when I should run the regression tests.
> >> Thanks for the offer. Do we need to run the tests before cutting the 
> >> release?
> >>
> >> Most of the tickets aren't related to text extraction. Those which are 
> >> related
> >> should decrease the number of exceptions and increase the accuracy.
> >>
> >> WDYT?
> >>
> >>
> >>>
> >>> Thank you, all!
> >>>
> >>> Cheers,
> >>>
> >>>               Tim
> >>>
> >>> On Sat, Mar 12, 2022 at 5:29 AM Andreas Lehmkuehler <andr...@lehmi.de> 
> >>> wrote:
> >>>>
> >>>> Am 11.03.22 um 08:30 schrieb Tilman Hausherr:
> >>>>> Am 11.03.2022 um 08:19 schrieb Andreas Lehmkuehler:
> >>>>>> Am 10.03.22 um 20:16 schrieb Tilman Hausherr:
> >>>>>>> I'd agree but that might mean PDFBOX-5384 wouldn't be fixed.
> >>>>>> It's there for quite some time and it seems to be a seldom corner 
> >>>>>> case. IMHO
> >>>>>> it can wait if we won't find a solution before Monday.
> >>>>>
> >>>>> No, that one was created on March 2nd. Oliver has just posted a 
> >>>>> suggestion so
> >>>>> maybe that is a solution.
> >>>> The ticket is quite new, but the issue itself was introduced in 2018 with
> >>>> 2.0.12. ;-)
> >>>>
> >>>> However, I'll have a look at the proposed solution.
> >>>>
> >>>> Andreas
> >>>>>
> >>>>> Tilman
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> WDYT?
> >>>>>>
> >>>>>> Andreas
> >>>>>>
> >>>>>>>
> >>>>>>> Tilman
> >>>>>>>
> >>>>>>> Am 10.03.2022 um 19:05 schrieb Andreas Lehmkuehler:
> >>>>>>>> Am 09.03.22 um 17:07 schrieb Tim Allison:
> >>>>>>>>> All,
> >>>>>>>>>
> >>>>>>>>> I've been out of the office for a bit and haven't caught up yet.
> >>>>>>>>> Apologies if I've missed the discussion.
> >>>>>>>>>
> >>>>>>>>> Are there plans for a 2.0.26 release?  We're probably a few weeks 
> >>>>>>>>> out
> >>>>>>>> How about cutting the release next Monday?
> >>>>>>>>
> >>>>>>>> Andreas
> >>>>>>>>
> >>>>>>>>> from starting our next 1.x and 2.x releases on Tika, and it would be
> >>>>>>>>> great to incorporate 2.0.26.  No problem at all if 2.0.26 is slated
> >>>>>>>>> for later.
> >>>>>>>>>
> >>>>>>>>> Thank you!
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>>
> >>>>>>>>>            Tim
> >>>>>>>>>
> >>>>>>>>> On Fri, Mar 4, 2022 at 10:46 PM Tilman Hausherr 
> >>>>>>>>> <thaush...@t-online.de> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Am 24.02.2022 um 07:41 schrieb Andreas Lehmkuehler:
> >>>>>>>>>>> Am 22.02.22 um 07:49 schrieb Andreas Lehmkuehler:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm planning to cut a new JBIG2 release next week. There aren't 
> >>>>>>>>>>>> that
> >>>>>>>>>>>> much changes but I think the fixes are worth to be released. [1]
> >>>>>>>>>>> I'm going to cut the release next weekend, if nobody objects.
> >>>>>>>>>>>
> >>>>>>>>>>> Once it is done we should think about a 2.0.26 release of PDFBox
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yes please!
> >>>>>>>>>>
> >>>>>>>>>> Tilman
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> >>>>>>>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> >>>>>>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> >>>>>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> >>>>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> >>>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> >>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> >>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> >>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> >>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> >> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to