[ 
https://issues.apache.org/jira/browse/PDFBOX-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107826#comment-14107826
 ] 

John Hewson edited comment on PDFBOX-2262 at 8/23/14 2:56 AM:
--------------------------------------------------------------

My latest commits tackle the multi-byte CMap problem, which wasn't handle 
correctly in PDFBox previously, and with my previous changes had resulted in a 
situation where we had bad behaviour due to new code being correct but existing 
code relying on it being buggy. As I'd already planned this as part of 
PDFBOX-2149, I took the time to finally refactor CMaps and Encodings, in 
particular CMaps with variable-length character codes. 

Hopefully you'll find the new code very easy to understand (there are 457 fewer 
lines :)), where once we had:
{code}
int codeLength;
for (int i = 0; i < string.length; i += codeLength)
{
    // Decode the value to a Unicode character
    codeLength = 1;
    String unicode = font.encode(string, i, codeLength);
    int[] charCodes;
    if (unicode == null && i + 1 < string.length)
    {
        // maybe a multibyte encoding
        codeLength++;
        unicode = font.encode(string, i, codeLength);
        charCodes = new int[] { font.getCodeFromArray(string, i, codeLength) };
    }
    else
    {
        charCodes = new int[] { font.getCodeFromArray(string, i, codeLength) };
    }

    ...
{code}

We now have:

{code}
InputStream in = new ByteArrayInputStream(string);
while (in.available() > 0)
{
    int code = font.readCode(in);
    String unicode = font.toUnicode(code);

    ...
{code}

Hopefully I didn't break too much in the process, the exceptions on the 
following files should now be fixed:

PDFBOX-1283.pdf  <== still has rendering issues
PDFBOX-1421.pdf  <== still has rendering issues
PDFBOX-1422.pdf
FOP-2252.pdf
freesanstest.pdf

None of the other test files with rendering issues are affected, they're still 
buggy, I'll take a look at them soon.


was (Author: jahewson):
My latest commits tackle the multi-byte CMap problem, which wasn't handle 
correctly in PDFBox previously, and with my previous changes had resulted in a 
situation where we had bad behaviour due to new code being correct but existing 
code relying on it being buggy. As I'd already planned to this as part of 
PDFBOX-2149, I took the time to finally refactor CMaps and Encodings, in 
particular CMaps with variable-length character codes. 

Hopefully you'll find the new code very easy to understand (there are 457 fewer 
lines :)), where once we had:
{code}
int codeLength;
for (int i = 0; i < string.length; i += codeLength)
{
    // Decode the value to a Unicode character
    codeLength = 1;
    String unicode = font.encode(string, i, codeLength);
    int[] charCodes;
    if (unicode == null && i + 1 < string.length)
    {
        // maybe a multibyte encoding
        codeLength++;
        unicode = font.encode(string, i, codeLength);
        charCodes = new int[] { font.getCodeFromArray(string, i, codeLength) };
    }
    else
    {
        charCodes = new int[] { font.getCodeFromArray(string, i, codeLength) };
    }

    ...
{code}

We now have:

{code}
InputStream in = new ByteArrayInputStream(string);
while (in.available() > 0)
{
    int code = font.readCode(in);
    String unicode = font.toUnicode(code);

    ...
{code}

Hopefully I didn't break too much in the process, the exceptions on the 
following files should now be fixed:

PDFBOX-1283.pdf  <== still has rendering issues
PDFBOX-1421.pdf  <== still has rendering issues
PDFBOX-1422.pdf
FOP-2252.pdf
freesanstest.pdf

None of the other test files with rendering issues are affected, they're still 
buggy, I'll take a look at them soon.

> Remove usage of AWT fonts
> -------------------------
>
>                 Key: PDFBOX-2262
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2262
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: PDModel, Rendering
>    Affects Versions: 2.0.0
>            Reporter: John Hewson
>            Assignee: John Hewson
>         Attachments: ELVIA-Reiserucktritt-Vollschutz.pdf-1.png, 
> FreeSansTest.pdf, PDFBOX-1094-094730.pdf-1.png, PDFBOX-1770.pdf-1.png, 
> bugzilla886049.pdf, bugzilla886049.pdf-1.png
>
>
> We're still using AWT fonts to render the "standard 14" built-in fonts, which 
> causes rendering problems and encoding issues (see  PDFBOX-2140). We're also 
> using AWT for some fallback fonts.
> Removal of these AWT fonts isn't too difficult, we need to load the fonts 
> using the existing PDFFontManager mechanism which has recently been added. 
> All missing TrueType fonts loaded from disk have been using SystemFontManager 
> for a number of weeks now. 
> We should ship some sensible default fonts with PDFBox, such as the 
> Liberation fonts (see PDFBOX-2169, PDFBOX-2263), in case PDFFontManager can't 
> find anything suitable, rather than falling back to the default TTF font, but 
> by default we'll probe the system for suitable fonts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to