[jira] [Commented] (PDFBOX-4116) could not add text without unicode in the font

xing Wang (JIRA) Mon, 19 Feb 2018 14:16:30 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369571#comment-16369571
 ]


xing Wang commented on PDFBOX-4116:
-----------------------------------

Hi [~tilman]

> So sometimes the code there is identical to the unicode value, but often it 
> is not.

I agree with your statement, especially I am working on the math expression 
intensive papers. 

The reason I want to create PDF document is to get the tight bounding box for 
the glyph, which is critical for my analysis. For example, in the attached 
file. I could not correctly get the tight bounding box of the glyph. 

[^6076-learn589519560.pdf.CMSY10.minus.pdf][^6076-learn589519560.pdf.CMSY10.minus.pdf]

 

^I am using a different tool, but I will try pdfbox soon.^ ^My procedure is as 
follows:^
 # ^use the pdfbox to render the glyph as the pdfbox-debugger do. and on the 
rendered image for the glyph to get how to adjust the top and bottom. Such as 
the one with code 33, with glyph name "minus"^
^!image-2018-02-19-16-11-24-611.png!^

 # ^Then I use pdfminer, which is a python tool to get the glyph bounding box 
show in the red.^ 
^!image-2018-02-19-16-12-23-438.png!^

 # ^I assume the black pixels in red bbox should be of the same portion w.r.t 
the rendered bufferimage in step. But this is not true. For the minus in step 
1, it's roughly in the middle, but for the red bbox, it's a bit in the upper 
half.^ 

^I will try to repeat the process in pdfbox. If possible, could you point me 
the procedures to get the glyph bbox?^

 

> could not add text without unicode in the font
> ----------------------------------------------
>
>                 Key: PDFBOX-4116
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4116
>             Project: PDFBox
>          Issue Type: Wish
>          Components: PDModel
>    Affects Versions: 2.0.8
>         Environment: Windows
>            Reporter: xing Wang
>            Priority: Minor
>         Attachments: 6076-learn589519560.pdf.CMSY10.minus.pdf, 
> 6076-learn589519560.pdf.CMSY10.minus.pdf.adj_char_bbox.0.png, 
> 6076-learn589519560.pdf.CMSY10.minus.pdf.org_char_bbox.0.png, 
> image-2018-02-19-09-23-00-110.png, image-2018-02-19-16-11-24-611.png, 
> image-2018-02-19-16-12-23-438.png
>
>
> !image-2018-02-19-09-23-00-110.png!
> As shown in the debugger, that the PDFType1Font map the code of 33 to 
> "minus", but there is no unicode value associated with it. 
> If we use the code `contentStream.showText("\u0021");` to add content, it 
> will cause an error of following. 
> Exception in thread "main" java.lang.IllegalArgumentException: U+0021 
> ('exclam') is not available in this font AMZNGR+CMSY10 (generic: 
> FREBPT+CMSY10) encoding: built-in (Type 1) with differences
> at org.apache.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:439)
> at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:323)
> at org.apache.pdfbox.debugger.CreatePDF.main(CreatePDF.java:63)
> The best way I could do is used the "appendRawCommands", but I find it's 
> marked as deprecated. I am wondering why or is there any replacement for this 
> function?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4116) could not add text without unicode in the font

Reply via email to