http://162.242.228.174/reports/reports_pdfbox_2.0.15-SNAPSHOT.tgz
This compares 2.0.15-SNAPSHOT with 2.0.13 (I think)...IIRC, though,
there were no content differences btwn 2.0.13 and 2.0.14. I did not
apply angle detection.
No new exceptions; 2 fixed exceptions. We're getting higher page
counts in a few documents, because we overrode processPages() to
process. Some changes in content, but overall, better, I think, based
on contents/common_token_comparisons_by_mime.xlsx.
To see where content appears to degrade, open
contents/content_diffs_(no|with)_exceptions, and sort column M
('NUM_COMMON_TOKENS_DIFF_IN_B') in ascending order. Also, look at
columns R (TOP_10_UNIQUE_TOKEN_DIFFS_A) and S
(TOP_10_UNIQUE_TOKEN_DIFFS_B)...these columns show the top 10 most
frequent tokens that are unique to A or unique to B; from this, it
looks like there is a regression in, e.g. govdocs1/038/038519.pdf,
but, generally (hand waving), it appears that there were word
segmentation problems in both A and B as I look at the results.
Cheers,
Tim
On Fri, Apr 5, 2019 at 10:53 AM Tim Allison <[email protected]> wrote:
>
> +1 I should have regression results by tomorrow
>
> On Fri, Apr 5, 2019 at 2:15 AM Maruan Sahyoun <[email protected]> wrote:
>>
>> +1
>>
>> > Am 05.04.2019 um 06:31 schrieb Andreas Lehmkuehler <[email protected]>:
>> >
>> > Hi,
>> >
>> > looks like it's time for the next release. How about cutting 2.0.15 next
>> > monday?
>> >
>> > WDYT?
>> >
>> > Andreas
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]