Hi all,

@Tilman thanks for the analysis

Looks like we are ready to go for the next 3.0.x release.

I'm planing to cut the 3.0.4 release next Monday if nobody objects

ANderas

Am 13.01.25 um 17:35 schrieb Tilman Hausherr:
On 13.01.2025 14:23, Tilman Hausherr wrote:
On 12.01.2025 16:52, Tilman Hausherr wrote:
I will redo the "A" part and later the "B" part due to the font installation (thanks).

https://home.snafu.de/tilman/tmp/reports_pdfbox_3.0.3_vs_3.0.4-3.tar.xz

there are some new exceptions, but I assume that these aren't real, rather some tika or OS problems.

I didn't find any problems that need to be handled. The things I found have been mentioned before, the superscript problem and the "spaced"- Problem.

The superscript problem may be solved in the future either by an algorithm change (don't know if possible) that numbers in front of a latin word get separated, or by improved strategies about the space size. Maybe a database of fonts and their space size.

It may also be possible that users want configuration of the /ActualText feature. Most of the time it improves things, but sometimes it is used for extraction censorship.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to