Sorry. I was afk. I’ll kick this off shortly.

On Wed, Jun 19, 2019 at 2:54 AM Tilman Hausherr <[email protected]>
wrote:

> Hi Tim,
>
> Please do another one.
>
> Thanks
> Tilman
>
> Am 15.06.2019 um 02:13 schrieb Tim Allison:
> > http://162.242.228.174/reports/pdfbox_2_0_16_1861286.tgz
> >
> > Sharing before reviewing...sorry...
> >
> > On Fri, Jun 14, 2019 at 7:56 AM Tim Allison <[email protected]> wrote:
> >> Y. Will rerun today.
> >>
> >> On Fri, Jun 14, 2019 at 12:09 AM Tilman Hausherr <[email protected]>
> wrote:
> >>> Hi, can you run these again? The recent fixed regression in PDFBOX-4550
> >>> resulted in large amounts of files without extraction.
> >>> (NUM_COMMON_TOKENS_A much larger than NUM_COMMON_TOKENS_B)
> >>>
> >>> Tilman
> >>>
> >>> Am 13.06.2019 um 14:36 schrieb Tim Allison:
> >>>> All,
> >>>>
> >>>>     On a dev branch, I replaced Optimaize with a dev version of
> >>>> OpenNLP's language detector, and I updated the common tokens list to
> >>>> cover the 120 langs covered by a dev version of OpenNLP's language
> >>>> model.  I changed the min token length for common words to 3 (from 4),
> >>>> and I'm now using 30k common tokens per lang rather than 20k.
> >>>>
> >>>>     I reran this dev version of tika-eval on PDFBox 2.0.15 vs
> >>>> 2.0.16-SNAPSHOT, and the results are here:
> >>>>
> >>>> http://162.242.228.174/reports/tika_eval_opennlp_reports.tgz
> >>>>
> >>>>     Are there any critical problems with the updates in the contents
> >>>> comparison files?  Any improvements?
> >>>>
> >>>>     I notice that 'cmn' is the most common category for 'not much
> actual
> >>>> text'...we may want to require a higher confidence in language
> >>>> detection before reporting a detected language...
> >>>>
> >>>>     Any and all recommendations are welcomed!  Thank you!
> >>>>
> >>>>              Cheers,
> >>>>
> >>>>                          Tim
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Jun 13, 2019 at 12:54 AM Andreas Lehmkuehler <
> [email protected]> wrote:
> >>>>> Am 12.06.19 um 21:08 schrieb Tilman Hausherr:
> >>>>>> Am 12.06.2019 um 03:56 schrieb Tim Allison:
> >>>>>>> Reports are available here for 2.0.16-SNAPSHOT:
> >>>>>>>
> >>>>>>> http://162.242.228.174/reports/pdfbox_2_0_16-SNAPSHOT_reports.tgz
> >>>>>>>
> >>>>>>> I haven't had a chance to look yet...
> >>>>>> I did... It's not looking good. It's probably the change in the
> ToUnicode stream
> >>>>>> parsing, I'll investigate this.
> >>>>> I'm going to have a look
> >>>>>
> >>>>> Andreas
> >>>>>> Tilman
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Sat, Jun 8, 2019 at 9:15 AM Tim Allison <[email protected]>
> wrote:
> >>>>>>>> +1
> >>>>>>>>
> >>>>>>>> On Sat, Jun 8, 2019 at 6:33 AM Andreas Lehmkuehler <
> [email protected]> wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> looks like it's time for the next release. How about cutting
> 2.0.16 in about 2
> >>>>>>>>> weeks from now?
> >>>>>>>>>
> >>>>>>>>> WDYT?
> >>>>>>>>>
> >>>>>>>>> Andreas
> >>>>>>>>>
> >>>>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>>>>> For additional commands, e-mail: [email protected]
> >>>>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>>> For additional commands, e-mail: [email protected]
> >>>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>> For additional commands, e-mail: [email protected]
> >>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: [email protected]
> >>>>> For additional commands, e-mail: [email protected]
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [email protected]
> >>>> For additional commands, e-mail: [email protected]
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [email protected]
> >>> For additional commands, e-mail: [email protected]
> >>>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to