[jira] [Comment Edited] (PDFBOX-922) True type PDFont subclass only supports WinAnsiEncoding (hardcoded!)

Antti Lankila (JIRA) Thu, 09 Oct 2014 01:01:05 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164893#comment-14164893
 ]


Antti Lankila edited comment on PDFBOX-922 at 10/9/14 8:00 AM:
---------------------------------------------------------------

I remain mildly confused about the subsetting. Why not just embed the entire 
font and render it as CID keyed font? I have a (misnamed) attachment in 
SourceForge where I am hoping that next jPod release will incorporate some 
useful functions: 
http://sourceforge.net/p/jpodlib/patches/_discuss/thread/97a19659/a7dd/attachment/PDFBoxImprovements.java

The Unicode support here works with the loadCIDFromTTF() method. It constructs 
the CID font and the unicode CMAP for copy-paste. Note that in jPod, fonts 
encode themselves through the mapping, the content stream generator calls 
font's Encoding's encode(String) method to generate byte sequences to embed 
into the document. This is the API that PDFBox must adopt, if it hasn't 
already. (PDFBox also wants a decode() method, I guess, but I did not provide 
one because it was not necessary for solving my immediate problem.)


was (Author: [email protected]):
I remain mildly confused about the subsetting. Why not just embed the entire 
font and render it as CID keyed font? I have a (misnamed) attachment in 
SourceForge where I am hoping that next jPod release will incorporate some 
useful functions: 
http://sourceforge.net/p/jpodlib/patches/_discuss/thread/97a19659/a7dd/attachment/PDFBoxImprovements.java

The Unicode support here works with the loadCIDFromTTF() method. It constructs 
the CID font and the unicode CMAP for copy-paste. Note that in jPod, fonts 
encode themselves through the mapping, the content stream generator calls 
font's encode(String) method to generate byte sequences to embed into the 
document. This is the API that PDFBox must adopt, if it hasn't already. (PDFBox 
also wants a decode() method, I guess, but I did not provide one because it was 
not necessary for solving my immediate problem.)

> True type PDFont subclass only supports WinAnsiEncoding (hardcoded!)
> --------------------------------------------------------------------
>
>                 Key: PDFBOX-922
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-922
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: Writing
>    Affects Versions: 1.3.1
>         Environment: JDK 1.6 / OS irrelevant, tried against 1.3.1 and 1.2.0
>            Reporter: Thanos Agelatos
>            Assignee: Andreas Lehmkühler
>         Attachments: pdfbox-unicode.diff, pdfbox-unicode2.diff
>
>
> PDFBox cannot embed Identity-H or Identity-V type TTF fonts in the PDF it 
> creates, making it impossible to create PDFs in any language apart from 
> English and ones supported in WinAnsiEncoding. This behaviour is caused 
> because method PDTrueTypeFont.loadTTF has hardcoded WinAnsiEncoding inside, 
> and there is no Identity-H or Identity-V Encoding classes provided (to set 
> afterwards via PDFont.setFont() )
> This excludes the following languages plus many others:
> - Greek
> - Bulgarian
> - Swedish
> - Baltic languages
> - Malteze 
> The PDF created contains garbled characters and/or squares.
> Simple test case:
>                 PDDocument doc = null;
>               try {
>                       doc = new PDDocument();
>                       PDPage page = new PDPage();
>                       doc.addPage(page);
>                       // extract fonts for fields
>                       byte[] arialNorm = extractFont("arial.ttf");
>                       //byte[] arialBold = extractFont("arialbd.ttf"); 
>                       //PDFont font = PDType1Font.HELVETICA;
>                       PDFont font = PDTrueTypeFont.loadTTF(doc, new 
> ByteArrayInputStream(arialNorm));
>                       
>                       PDPageContentStream contentStream = new 
> PDPageContentStream(doc, page);
>                       contentStream.beginText();
>                       contentStream.setFont(font, 12);
>                       contentStream.moveTextPositionByAmount(100, 700);
>                       contentStream.drawString("Hello world from PDFBox 
> ελληνικά"); // text here may appear garbled; insert any text in Greek or 
> Bulgarian or Malteze
>                       contentStream.endText();
>                       contentStream.close();
>                       doc.save("pdfbox.pdf");
>                       System.out.println(" created!");
>               } catch (Exception ioe) {
>                       ioe.printStackTrace();
>               } finally {
>                       if (doc != null) {
>                               try { doc.close(); } catch (Exception e) {}
>                       }
>               }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PDFBOX-922) True type PDFont subclass only supports WinAnsiEncoding (hardcoded!)

Reply via email to