Hi all,
@Tilman thanks for the analysis
Looks like we are ready to go for the next 3.0.x release.
I'm planing to cut the 3.0.4 release next Monday if nobody objects
ANderas
Am 13.01.25 um 17:35 schrieb Tilman Hausherr:
On 13.01.2025 14:23, Tilman Hausherr wrote:
On 12.01.2025 16:52, Tilman Hausherr wrote:
I will redo the "A" part and later the "B" part due to the font
installation (thanks).
https://home.snafu.de/tilman/tmp/reports_pdfbox_3.0.3_vs_3.0.4-3.tar.xz
there are some new exceptions, but I assume that these aren't real,
rather some tika or OS problems.
I didn't find any problems that need to be handled. The things I found
have been mentioned before, the superscript problem and the "spaced"-
Problem.
The superscript problem may be solved in the future either by an
algorithm change (don't know if possible) that numbers in front of a
latin word get separated, or by improved strategies about the space
size. Maybe a database of fonts and their space size.
It may also be possible that users want configuration of the /ActualText
feature. Most of the time it improves things, but sometimes it is used
for extraction censorship.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org