[ 
https://issues.apache.org/jira/browse/PDFBOX-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904602#comment-17904602
 ] 

Tilman Hausherr edited comment on PDFBOX-5920 at 12/11/24 9:23 AM:
-------------------------------------------------------------------

I tried using {{font.getStringWidth(" ")}} for {{getFontWidth()}} and there are 
many text extraction differences. However all except one are improvements! One 
not improved is PDFBOX-2959. That's because type3 fonts don't support encoding. 
I'll investigate that next.

improved:
7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M.pdf
10.5445IR1000150280-p15.pdf
PDFBOX-3782-reduced.pdf
PDFBOX-4934-JP.pdf

others:
artikel1_20_arab.pdf  unclear
PDFBOX-756-p1.pdf  not better
PDFBOX-2959-reduced.pdf not better
SO51672080-tiny-gaps.pdf irrelevant
PDFBOX-5324.pdf irrelevant



was (Author: tilman):
I tried using {{font.getStringWidth(" ")}} for {{getFontWidth()}} and there are 
many text extraction differences. However all except one are improvements! The 
only one not improved is PDFBOX-2959. That's because type3 fonts don't support 
encoding. I'll investigate that next.

improved:
7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M.pdf
10.5445IR1000150280-p15.pdf
PDFBOX-3782-reduced.pdf
PDFBOX-4934-JP.pdf

others:
artikel1_20_arab.pdf  unclear
PDFBOX-756-p1.pdf  not better
PDFBOX-2959-reduced.pdf not better
SO51672080-tiny-gaps.pdf irrelevant
PDFBOX-5324.pdf irrelevant


> PDType0Font return invalid space width
> --------------------------------------
>
>                 Key: PDFBOX-5920
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5920
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 3.0.3 PDFBox
>            Reporter: Miroslav Holubec
>            Assignee: Tilman Hausherr
>            Priority: Major
>              Labels: fontwidth, truetype
>         Attachments: texgyreheros-regular.ttf
>
>
> WinAnsiEncoding supports not all available characters from the font. That is 
> the reason why we moved to the workaround proposed by FAQ, also to use 
> PDType0Font. Now we have realized, that returned space width from 
> font.getSpaceWidth() returns invalid value.
> {noformat}
>  class FontWidthTest {
>     @Test
>     void pdType0FontTest() throws IOException {
>         try (InputStream fontStream = 
> FontWidthTest.class.getResourceAsStream("/texgyreheros-regular.ttf");
>              PDDocument document = new PDDocument()) {
>             PDFont font = PDType0Font.load(document, fontStream, false);
>             assertEquals(20064.0, font.getStringWidth("The quick brown fox 
> jumps over the lazy dog."));
>             assertEquals(278.0, font.getSpaceWidth()); // FAIL: returns 584.0
>         }
>     }
>     @Test
>     void pdTrueTypeFontTest() throws IOException {
>         try (InputStream fontStream = 
> FontWidthTest.class.getResourceAsStream("/texgyreheros-regular.ttf");
>              PDDocument document = new PDDocument()) {
>             PDFont font = PDTrueTypeFont.load(document, fontStream, 
> WinAnsiEncoding.INSTANCE);
>             assertEquals(20064.0, font.getStringWidth("The quick brown fox 
> jumps over the lazy dog."));
>             assertEquals(278.0, font.getSpaceWidth());
>         }
>     }
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to