Re: PDFBox 2.0.33 release

Tilman Hausherr Tue, 07 Jan 2025 06:02:09 -0800

On 07.01.2025 14:10, Tilman Hausherr wrote:

latest:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.32_vs_2.0.33-6.tar.xz


So this is pretty good now. Here's what I found:

- superscript degradation ("1 coupled" becomes "1coupled"): annoying,but should be solved separately some day with an algorithm improvement.Having correct space detection in ordinary texts has a higher priority.

- spaced texts degradation ("METAMORPHOSE" becomes "M E T A M O R P H OS E"): that's because these texts look like that in the original.

- angled degradation: these are differences, but both extractions arebad. That's what the angle option is for (maybe use this option in thefuture?)


- PDFBOX-5384 - we'll probably need more time for that one.

Besides that, lots of improvements, and the tests really helped findingthe flaws in PDFBOX-5920.


Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: PDFBox 2.0.33 release

Reply via email to