Thanks for the test... the sum is still negative, but if we'd ignore the
truncated files I bet we'd be positive.
I have downloaded a few of the regressions but won't create issues this
time as yesterday's turned out to be duplicates, I'll wait for Andreas
next commit and will create issues only if these aren't solved.
@Andreas - ping me if you didn't keep the "secret" URL.
Some misc thoughts...
039800.pdf: "refinery's" is a different token than refinery. Shouldn't
"refinery's" be three tokens? I mention this because refinery is
probably in a dictionary.
Some differences are because of a different treatment of the space in
bad fonts. Some were improved, and some now look like this "C I T I E S
W I T H O U T D R U G S". There is an open issue about these. It is
tricky because if we treat these like 1 word, we'd also lose spaces
where we don't want.
commoncrawl2/5N/5NSKV4CTVY4KT7R2FGY4XJDIK4PRLA4Z I can't find. I used
http://XXX.XXX.XXX.XXX/docs/commoncrawl2/5N/5NSKV4CTVY4KT7R2FGY4XJDIK4PRLA4Z
Tilman
Am 10.05.2017 um 11:42 schrieb Allison, Timothy B.:
Haven't had a chance to look. Reports are here:
http://162.242.228.174/reports/reports_pdfbox_2_0_6_20170510.tar.gz
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]