Am 06.04.19 um 15:50 schrieb Tim Allison:
http://162.242.228.174/reports/reports_pdfbox_2.0.15-SNAPSHOT.tgz
This compares 2.0.15-SNAPSHOT with 2.0.13 (I think)...IIRC, though,
there were no content differences btwn 2.0.13 and 2.0.14. I did not
apply angle detection.
Thanks again for running the tests
No new exceptions; 2 fixed exceptions. We're getting higher page
counts in a few documents, because we overrode processPages() to
process. Some changes in content, but overall, better, I think, based
on contents/common_token_comparisons_by_mime.xlsx.
To see where content appears to degrade, open
contents/content_diffs_(no|with)_exceptions, and sort column M
('NUM_COMMON_TOKENS_DIFF_IN_B') in ascending order. Also, look at
columns R (TOP_10_UNIQUE_TOKEN_DIFFS_A) and S
(TOP_10_UNIQUE_TOKEN_DIFFS_B)...these columns show the top 10 most
frequent tokens that are unique to A or unique to B; from this, it
looks like there is a regression in, e.g. govdocs1/038/038519.pdf,
but, generally (hand waving), it appears that there were word
segmentation problems in both A and B as I look at the results.
I had a first look and there are differences, but I'm not sure if it is a
regression.
The sorted text extraction results from 2.0.13/14 and 2.0.15-SNAPSHOT are
equal. The unsorted results from 2.0.13/14 are equal but those from
2.0.15-SNAPSHOT are different.
Still investigating ...
Andreas
Cheers,
Tim
On Fri, Apr 5, 2019 at 10:53 AM Tim Allison <[email protected]> wrote:
+1 I should have regression results by tomorrow
On Fri, Apr 5, 2019 at 2:15 AM Maruan Sahyoun <[email protected]> wrote:
+1
Am 05.04.2019 um 06:31 schrieb Andreas Lehmkuehler <[email protected]>:
Hi,
looks like it's time for the next release. How about cutting 2.0.15 next monday?
WDYT?
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]