Again, apologies for my delay:

http://162.242.228.174/reports/pdfbox_2_0_16_1861801.tgz

On Mon, Jun 24, 2019 at 6:03 AM Tim Allison <[email protected]> wrote:
>
> Sorry. Never made it back to my keyboard on Friday. I just started the 
> comparison code. Should have reports in a few hours.
>
> On Mon, Jun 24, 2019 at 12:36 AM Andreas Lehmkuehler <[email protected]> wrote:
>>
>> @Tim, just a friendly reminder, are there already any results available?
>>
>> Thanks
>> Andreas
>>
>> Am 21.06.19 um 17:27 schrieb Tim Allison:
>> > Sorry. I was afk. I’ll kick this off shortly.
>> >
>> > On Wed, Jun 19, 2019 at 2:54 AM Tilman Hausherr <[email protected]>
>> > wrote:
>> >
>> >> Hi Tim,
>> >>
>> >> Please do another one.
>> >>
>> >> Thanks
>> >> Tilman
>> >>
>> >> Am 15.06.2019 um 02:13 schrieb Tim Allison:
>> >>> http://162.242.228.174/reports/pdfbox_2_0_16_1861286.tgz
>> >>>
>> >>> Sharing before reviewing...sorry...
>> >>>
>> >>> On Fri, Jun 14, 2019 at 7:56 AM Tim Allison <[email protected]> wrote:
>> >>>> Y. Will rerun today.
>> >>>>
>> >>>> On Fri, Jun 14, 2019 at 12:09 AM Tilman Hausherr <[email protected]>
>> >> wrote:
>> >>>>> Hi, can you run these again? The recent fixed regression in PDFBOX-4550
>> >>>>> resulted in large amounts of files without extraction.
>> >>>>> (NUM_COMMON_TOKENS_A much larger than NUM_COMMON_TOKENS_B)
>> >>>>>
>> >>>>> Tilman
>> >>>>>
>> >>>>> Am 13.06.2019 um 14:36 schrieb Tim Allison:
>> >>>>>> All,
>> >>>>>>
>> >>>>>>      On a dev branch, I replaced Optimaize with a dev version of
>> >>>>>> OpenNLP's language detector, and I updated the common tokens list to
>> >>>>>> cover the 120 langs covered by a dev version of OpenNLP's language
>> >>>>>> model.  I changed the min token length for common words to 3 (from 4),
>> >>>>>> and I'm now using 30k common tokens per lang rather than 20k.
>> >>>>>>
>> >>>>>>      I reran this dev version of tika-eval on PDFBox 2.0.15 vs
>> >>>>>> 2.0.16-SNAPSHOT, and the results are here:
>> >>>>>>
>> >>>>>> http://162.242.228.174/reports/tika_eval_opennlp_reports.tgz
>> >>>>>>
>> >>>>>>      Are there any critical problems with the updates in the contents
>> >>>>>> comparison files?  Any improvements?
>> >>>>>>
>> >>>>>>      I notice that 'cmn' is the most common category for 'not much
>> >> actual
>> >>>>>> text'...we may want to require a higher confidence in language
>> >>>>>> detection before reporting a detected language...
>> >>>>>>
>> >>>>>>      Any and all recommendations are welcomed!  Thank you!
>> >>>>>>
>> >>>>>>               Cheers,
>> >>>>>>
>> >>>>>>                           Tim
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Thu, Jun 13, 2019 at 12:54 AM Andreas Lehmkuehler <
>> >> [email protected]> wrote:
>> >>>>>>> Am 12.06.19 um 21:08 schrieb Tilman Hausherr:
>> >>>>>>>> Am 12.06.2019 um 03:56 schrieb Tim Allison:
>> >>>>>>>>> Reports are available here for 2.0.16-SNAPSHOT:
>> >>>>>>>>>
>> >>>>>>>>> http://162.242.228.174/reports/pdfbox_2_0_16-SNAPSHOT_reports.tgz
>> >>>>>>>>>
>> >>>>>>>>> I haven't had a chance to look yet...
>> >>>>>>>> I did... It's not looking good. It's probably the change in the
>> >> ToUnicode stream
>> >>>>>>>> parsing, I'll investigate this.
>> >>>>>>> I'm going to have a look
>> >>>>>>>
>> >>>>>>> Andreas
>> >>>>>>>> Tilman
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> On Sat, Jun 8, 2019 at 9:15 AM Tim Allison <[email protected]>
>> >> wrote:
>> >>>>>>>>>> +1
>> >>>>>>>>>>
>> >>>>>>>>>> On Sat, Jun 8, 2019 at 6:33 AM Andreas Lehmkuehler <
>> >> [email protected]> wrote:
>> >>>>>>>>>>> Hi,
>> >>>>>>>>>>>
>> >>>>>>>>>>> looks like it's time for the next release. How about cutting
>> >> 2.0.16 in about 2
>> >>>>>>>>>>> weeks from now?
>> >>>>>>>>>>>
>> >>>>>>>>>>> WDYT?
>> >>>>>>>>>>>
>> >>>>>>>>>>> Andreas
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >> ---------------------------------------------------------------------
>> >>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>> >>>>>>>>>>> For additional commands, e-mail: [email protected]
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >> ---------------------------------------------------------------------
>> >>>>>>>>> To unsubscribe, e-mail: [email protected]
>> >>>>>>>>> For additional commands, e-mail: [email protected]
>> >>>>>>>>>
>> >>>>>>>>
>> >> ---------------------------------------------------------------------
>> >>>>>>>> To unsubscribe, e-mail: [email protected]
>> >>>>>>>> For additional commands, e-mail: [email protected]
>> >>>>>>>>
>> >>>>>>> ---------------------------------------------------------------------
>> >>>>>>> To unsubscribe, e-mail: [email protected]
>> >>>>>>> For additional commands, e-mail: [email protected]
>> >>>>>>>
>> >>>>>> ---------------------------------------------------------------------
>> >>>>>> To unsubscribe, e-mail: [email protected]
>> >>>>>> For additional commands, e-mail: [email protected]
>> >>>>>>
>> >>>>>
>> >>>>> ---------------------------------------------------------------------
>> >>>>> To unsubscribe, e-mail: [email protected]
>> >>>>> For additional commands, e-mail: [email protected]
>> >>>>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: [email protected]
>> >>> For additional commands, e-mail: [email protected]
>> >>>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >>
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to