[ 
https://issues.apache.org/jira/browse/PDFBOX-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17909868#comment-17909868
 ] 

Tilman Hausherr edited comment on PDFBOX-5920 at 1/5/25 7:31 PM:
-----------------------------------------------------------------

Space lost in files:

 [^PDFBOX-5920-862271-p1-superscript-prefix_reduced.pdf] 
Space width for font CMSS8 is 354.0
It turns out that the space is in the font. So this case is bad luck, it looked 
better previously for the "wrong reasons". Subscript / superscript are 
difficult anyway.

[^PDFBOX-5920-054514-p6_reduced.pdf]
Space width for font LZMRSS+Times-Roman~14 is 1000.0

getStringWidth(" ") for that one returns 1000, but this is incorrect IMHO, 
because the space doesn't exist in that font, nameToGID("space") returns 0 
because space doesn't exist in that charset. However no exception is thrown.

 [^PDFBOX-5920-Y5U2XZCKG2U6TO3FC36NCGOZECHQA2PY-p39-reduced.pdf] 
Space width for font OCRJQP+Times-Roman is 1000.0

likely same problem as the previous file.


was (Author: tilman):
Space lost in files:

 [^PDFBOX-5920-862271-p1-superscript-prefix_reduced.pdf] 
Space width for font CMSS8 is 354.0

[^PDFBOX-5920-054514-p6_reduced.pdf]
Space width for font LZMRSS+Times-Roman~14 is 1000.0

getStringWidth(" ") for that one returns 1000, but this is incorrect IMHO, 
because the space doesn't exist in that font, nameToGID("space") returns 0 
because space doesn't exist in that charset. However no exception is thrown.

 [^PDFBOX-5920-Y5U2XZCKG2U6TO3FC36NCGOZECHQA2PY-p39-reduced.pdf] 
Space width for font OCRJQP+Times-Roman is 1000.0

likely same problem as the previous file.

> PDType0Font return invalid space width
> --------------------------------------
>
>                 Key: PDFBOX-5920
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5920
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.32, 3.0.3 PDFBox
>            Reporter: Miroslav Holubec
>            Assignee: Tilman Hausherr
>            Priority: Major
>              Labels: fontwidth, truetype
>             Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
>         Attachments: PDFBOX-5920-054514-p6_reduced.pdf, 
> PDFBOX-5920-862271-p1-superscript-prefix_reduced.pdf, 
> PDFBOX-5920-Y5U2XZCKG2U6TO3FC36NCGOZECHQA2PY-p39-reduced.pdf, 
> texgyreheros-regular.ttf
>
>
> WinAnsiEncoding supports not all available characters from the font. That is 
> the reason why we moved to the workaround proposed by FAQ, also to use 
> PDType0Font. Now we have realized, that returned space width from 
> font.getSpaceWidth() returns invalid value.
> {noformat}
>  class FontWidthTest {
>     @Test
>     void pdType0FontTest() throws IOException {
>         try (InputStream fontStream = 
> FontWidthTest.class.getResourceAsStream("/texgyreheros-regular.ttf");
>              PDDocument document = new PDDocument()) {
>             PDFont font = PDType0Font.load(document, fontStream, false);
>             assertEquals(20064.0, font.getStringWidth("The quick brown fox 
> jumps over the lazy dog."));
>             assertEquals(278.0, font.getSpaceWidth()); // FAIL: returns 584.0
>         }
>     }
>     @Test
>     void pdTrueTypeFontTest() throws IOException {
>         try (InputStream fontStream = 
> FontWidthTest.class.getResourceAsStream("/texgyreheros-regular.ttf");
>              PDDocument document = new PDDocument()) {
>             PDFont font = PDTrueTypeFont.load(document, fontStream, 
> WinAnsiEncoding.INSTANCE);
>             assertEquals(20064.0, font.getStringWidth("The quick brown fox 
> jumps over the lazy dog."));
>             assertEquals(278.0, font.getSpaceWidth());
>         }
>     }
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to