[jira] [Commented] (PDFBOX-922) True type PDFont subclass only supports WinAnsiEncoding (hardcoded!)

John Hewson (JIRA) Mon, 02 Jun 2014 10:23:46 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015585#comment-14015585
 ]


John Hewson commented on PDFBOX-922:
------------------------------------

Antti, you're along the right lines, it's actually simpler than that. You don't 
need a *CIDToGIDMap*, it can just be the name *Identity*, in which case CID = 
GID for all characters in the font, which means you can use the raw GIDs in the 
*ToUnicode* map and in COSStrings when they are written to the content stream.

You'll need to subset the TTF fonts because they're usually quite large, but 
PDFBox has some code for doing this already.

> True type PDFont subclass only supports WinAnsiEncoding (hardcoded!)
> --------------------------------------------------------------------
>
>                 Key: PDFBOX-922
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-922
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: Writing
>    Affects Versions: 1.3.1
>         Environment: JDK 1.6 / OS irrelevant, tried against 1.3.1 and 1.2.0
>            Reporter: Thanos Agelatos
>            Assignee: Andreas Lehmkühler
>
> PDFBox cannot embed Identity-H or Identity-V type TTF fonts in the PDF it 
> creates, making it impossible to create PDFs in any language apart from 
> English and ones supported in WinAnsiEncoding. This behaviour is caused 
> because method PDTrueTypeFont.loadTTF has hardcoded WinAnsiEncoding inside, 
> and there is no Identity-H or Identity-V Encoding classes provided (to set 
> afterwards via PDFont.setFont() )
> This excludes the following languages plus many others:
> - Greek
> - Bulgarian
> - Swedish
> - Baltic languages
> - Malteze 
> The PDF created contains garbled characters and/or squares.
> Simple test case:
>                 PDDocument doc = null;
>               try {
>                       doc = new PDDocument();
>                       PDPage page = new PDPage();
>                       doc.addPage(page);
>                       // extract fonts for fields
>                       byte[] arialNorm = extractFont("arial.ttf");
>                       //byte[] arialBold = extractFont("arialbd.ttf"); 
>                       //PDFont font = PDType1Font.HELVETICA;
>                       PDFont font = PDTrueTypeFont.loadTTF(doc, new 
> ByteArrayInputStream(arialNorm));
>                       
>                       PDPageContentStream contentStream = new 
> PDPageContentStream(doc, page);
>                       contentStream.beginText();
>                       contentStream.setFont(font, 12);
>                       contentStream.moveTextPositionByAmount(100, 700);
>                       contentStream.drawString("Hello world from PDFBox 
> ελληνικά"); // text here may appear garbled; insert any text in Greek or 
> Bulgarian or Malteze
>                       contentStream.endText();
>                       contentStream.close();
>                       doc.save("pdfbox.pdf");
>                       System.out.println(" created!");
>               } catch (Exception ioe) {
>                       ioe.printStackTrace();
>               } finally {
>                       if (doc != null) {
>                               try { doc.close(); } catch (Exception e) {}
>                       }
>               }



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-922) True type PDFont subclass only supports WinAnsiEncoding (hardcoded!)

Reply via email to