[ https://issues.apache.org/jira/browse/PDFBOX-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904602#comment-17904602 ]
Tilman Hausherr edited comment on PDFBOX-5920 at 12/11/24 9:23 AM: ------------------------------------------------------------------- I tried using {{font.getStringWidth(" ")}} for {{getFontWidth()}} and there are many text extraction differences. However all except one are improvements! One not improved is PDFBOX-2959. That's because type3 fonts don't support encoding. I'll investigate that next. improved: 7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M.pdf 10.5445IR1000150280-p15.pdf PDFBOX-3782-reduced.pdf PDFBOX-4934-JP.pdf others: artikel1_20_arab.pdf unclear PDFBOX-756-p1.pdf not better PDFBOX-2959-reduced.pdf not better SO51672080-tiny-gaps.pdf irrelevant PDFBOX-5324.pdf irrelevant was (Author: tilman): I tried using {{font.getStringWidth(" ")}} for {{getFontWidth()}} and there are many text extraction differences. However all except one are improvements! The only one not improved is PDFBOX-2959. That's because type3 fonts don't support encoding. I'll investigate that next. improved: 7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M.pdf 10.5445IR1000150280-p15.pdf PDFBOX-3782-reduced.pdf PDFBOX-4934-JP.pdf others: artikel1_20_arab.pdf unclear PDFBOX-756-p1.pdf not better PDFBOX-2959-reduced.pdf not better SO51672080-tiny-gaps.pdf irrelevant PDFBOX-5324.pdf irrelevant > PDType0Font return invalid space width > -------------------------------------- > > Key: PDFBOX-5920 > URL: https://issues.apache.org/jira/browse/PDFBOX-5920 > Project: PDFBox > Issue Type: Bug > Components: FontBox > Affects Versions: 3.0.3 PDFBox > Reporter: Miroslav Holubec > Assignee: Tilman Hausherr > Priority: Major > Labels: fontwidth, truetype > Attachments: texgyreheros-regular.ttf > > > WinAnsiEncoding supports not all available characters from the font. That is > the reason why we moved to the workaround proposed by FAQ, also to use > PDType0Font. Now we have realized, that returned space width from > font.getSpaceWidth() returns invalid value. > {noformat} > class FontWidthTest { > @Test > void pdType0FontTest() throws IOException { > try (InputStream fontStream = > FontWidthTest.class.getResourceAsStream("/texgyreheros-regular.ttf"); > PDDocument document = new PDDocument()) { > PDFont font = PDType0Font.load(document, fontStream, false); > assertEquals(20064.0, font.getStringWidth("The quick brown fox > jumps over the lazy dog.")); > assertEquals(278.0, font.getSpaceWidth()); // FAIL: returns 584.0 > } > } > @Test > void pdTrueTypeFontTest() throws IOException { > try (InputStream fontStream = > FontWidthTest.class.getResourceAsStream("/texgyreheros-regular.ttf"); > PDDocument document = new PDDocument()) { > PDFont font = PDTrueTypeFont.load(document, fontStream, > WinAnsiEncoding.INSTANCE); > assertEquals(20064.0, font.getStringWidth("The quick brown fox > jumps over the lazy dog.")); > assertEquals(278.0, font.getSpaceWidth()); > } > } > } > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org