DO NOT REPLY [Bug 49849] [PATCH] PDF links do only support ISO encoding

bugzilla Sun, 01 Apr 2012 14:09:13 -0700

https://issues.apache.org/bugzilla/show_bug.cgi?id=49849


--- Comment #5 from Glenn Adams <gl...@skynav.com> 2012-04-01 21:08:49 UTC ---
(In reply to comment #3)
> See patch

A brief look at this patch shows that it simply changes the output encoding
used for the PDFDocument.encode() function as follows:

-    public static final String ENCODING = "ISO-8859-1";
+    public static final String ENCODING = "UTF-8";

I believe this is incorrect. PDF files employ three string types:

(1) byte string (unspecified encoding)
(2) ascii string (us-ascii encoding)
(3) text string (either PDFDocEncoding or UTF-16BE)

Since (1) the encode() mechanism is used in a variety of contexts and (2) no
explicit use of UTF-8 is made by PDF, it would be incorrect to simply change
the output encoding returned by encode().

See ISO/IEC 32000 (2008), Section 7.9.2 for details.

This patch needs to be reworked to take these details into account.
Furthermore, the description of this bug is not adequate: it really doesn't
explain what the problem is:

* is it the fact that the rendered text of the content of basic-link is not
rendered with Polish characters? if so, then the problem is a font selection
problem, not a character encoding problem

* is it related to the character encoding used in the /Filespec dictionary for
the link annotation?

In any case, the present patch MUST NOT be applied.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 49849] [PATCH] PDF links do only support ISO encoding

Reply via email to