[jira] [Comment Edited] (PDFBOX-1206) TrueType glyphs render incorrectly

Tilman Hausherr (JIRA) Sun, 15 Feb 2015 03:54:40 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321519#comment-14321519
 ]


Tilman Hausherr edited comment on PDFBOX-1206 at 2/15/15 11:54 AM:
-------------------------------------------------------------------

I've now reached the point where I understand what TT outline drawing is about:

- two off-curve points actually have an implicit on-curve point in-between, see 
a nice visualization of this here:
http://typophile.com/node/66351#comment-394025
everything else can be deduced from that one, and all the implementations use 
these implicit points (or try to)
- the starting point is also an ending point and must be treated according to 
the same rules as the other points (this is what goes wrong in most 
implementations)
- a draw command can be done only TO an on-curve point, which will either be a 
real one, or an implicit one
- depending whether one had an off-curve point just before, the drawing will be 
a lineTo or a quadTo
- at the end, one has to consider the first on-curve starting point (which will 
either be a real on-curve point (it may be the 1st or the second one, or an 
implicit on-curve point!) and (if applicable) an off-curve point as first.

- the Batik code is wrong, it can't handle entire off-curve outlines
- the Batik code sometimes increases the offset by 2, this is risky
- all the subsequent improvements (including my own) just made it more complex
- all the other Batik-based implementations I've seen on the web are also wrong
- our "improvement" of the Batik source has a bug that Andreas and I both 
missed (the "% numberOfPoints" part looks past the endofcontour offset if there 
are several contours in the array)

- The PDF.js algorithm (Apache license)
https://github.com/mozilla/pdf.js/blob/c0d17013a28ee7aa048831560b6494a26c52360c/src/core/font_renderer.js
creates the implicit points and inserts them at the beginning or after the end. 
Wild stuff. But -probably- correct.

The problem is, that to understand what goes on, I had to look deeply into the 
Sun / PDF-Renderer code. I can't get my mind away from their algorithm. So if 
I'd write my own implementation starting from blank, I would still end with 
something similar to the Sun code.

A 2nd best idea would be something somewhat similar to the PDF.js 
implementation, but from scratch:
- Go through the outlines array and "insert" calculated implicit on-curve 
points where needed
- Take care of cases where beginning and end are off-curve, or where it starts 
with off-curve
- The result is a simpler array that will start with on-curve and end with 
on-curve and never have two off-curve points after another

This will be slower than the Sun code, but even the "wild" PDF.js code doesn't 
have speed problems.

So the idea would be to create something "not as fast as the Sun code", but 
correct. It can still be optimized from there.

WDYT?


was (Author: tilman):
I've now reached the point where I understand what TT outline drawing is about:

- two off-curve points actually have an implicit on-curve point in-between, see 
a nice visualization of this here:
http://typophile.com/node/66351#comment-394025
everything else can be deduced from that one, and all the implementations use 
these implicit points (or try to)
- the starting point is also an ending point and must be treated according to 
the same rules as the other points (this is what goes wrong in most 
implementations)
- a draw command can be done only TO an on-curve point, which will either be a 
real one, or an implicit one
- depending whether one had an off-curve point just before, the drawing will be 
a lineTo or a quadTo
- at the end, one has to consider the first on-curve starting point (which will 
either be a real on-curve point (it may be the 1st or the second one, or an 
implicit on-curve point!) and (if applicable) an off-curve point as first.

- the Batik code is wrong, it can't handle entire off-curve outlines
- the Batik code sometimes increases the offset by 2, this is risky
- all the subsequent improvements (including my own) just made it more complex
- all the other Batik-based implementations I've seen on the web are also wrong
- our "improvement" of the Batik source has a bug that Andreas and I both 
missed (the "% numberOfPoints" part looks past the endofcontour offset if there 
are several contours in the array)

- The PDF.js algorithm (Apache license)
https://github.com/mozilla/pdf.js/blob/c0d17013a28ee7aa048831560b6494a26c52360c/src/core/font_renderer.js
creates the implicit points and inserts them at the beginning or after the end. 
Wild stuff. But probably correct.

The problem is, that to understand what goes on, I had to look deeply into the 
Sun / PDF-Renderer code. I can't get my mind away from their algorithm. So if 
I'd write my own implementation starting from blank, I would still end with 
something similar to the Sun code.

A 2nd best idea would be something somewhat similar to the PDF.js 
implementation, but from scratch:
- Go through the outlines array and "insert" calculated implicit on-curve 
points where needed
- Take care of cases where beginning and end are off-curve, or where it starts 
with off-curve
- The result is a simpler array that will start with on-curve and end with 
on-curve and never have two off-curve points after another

This will be slower than the Sun code, but even the "wild" PDF.js code doesn't 
have speed problems.

So the idea would be to create something "not as fast as the Sun code", but 
correct. It can still be optimized from there.

WDYT?

> TrueType glyphs render incorrectly 
> -----------------------------------
>
>                 Key: PDFBOX-1206
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1206
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 1.6.0, 1.7.0
>         Environment: All (tested on OS X and Solaris but believe it affects 
> all platforms), tested java 7 update 1 & update 2
>            Reporter: Armando Singer
>            Assignee: Tilman Hausherr
>              Labels: convertToImage, image
>             Fix For: 2.0.0
>
>         Attachments: bad-text-rendering-jdk-1.8-source.pdf, 
> bad-text-rendering-jdk-1.8.jpg, bauer.ttf, contourRendering.js, 
> convertToImage.pdf, converttoimage.pdf-1.png, preflight.png
>
>
> I've done a extensive testing of pdfbox under the new Java 7, update 1 & 
> update 2 releases, and am noticing
> severe image quality issues when converting a pdf to an image.
> Attached is the same pdf turned converted to an image under Java 6, then 
> again with Java 7
> with the same code. The Java 7 version looks pretty bad.
> This is with jdk 1.7 update 1 & 2 (for solaris x64, running headless, and OS 
> X running Java 7 preview update 2 with default with no addition vm args to 
> the default java command). I've also tested against
> the latest code in svn (the images below are from the most current version).
> The good image below is from a recent version of the jdk 1.6 (and it has 
> always looked good
> on at least jdk1.5+).
> To test, I used code like this:
> {code}
> import java.awt.image.BufferedImage;
> import java.io.File;
> import java.io.IOException;
> import javax.imageio.ImageIO;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDPage;
> public final class PdfToImageTest {
>   public static BufferedImage toBufferedImage(final String pdfFile, final int 
> resolution)
>     throws IOException {
>     PDDocument document = null;
>     try {
>       document = PDDocument.load(pdfFile);
>       final PDPage page = (PDPage) 
> document.getDocumentCatalog().getAllPages().get(0);
>       final BufferedImage result = 
> page.convertToImage(BufferedImage.TYPE_INT_ARGB, resolution);
>       return result;
>     } finally {
>       if (document != null) {
>         document.close();
>       }
>     }
>   }
>   public static void main (String[] args) throws IOException {
>     ImageIO.write(toBufferedImage(args[0], 108), "png", new File(args[1]));
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (PDFBOX-1206) TrueType glyphs render incorrectly

Reply via email to