Re: svn commit: r946585 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/afp/fonts: AFPFont.java AbstractOutlineFont.java CharacterSet.java CharacterSetBuilder.java CharacterSetOrientation.java Dou

2010-05-26 Thread Vincent Hennebert
Ok.

Thanks,
Vincent


Jeremias Maerki wrote:
 Hi Vincent,
 
 in the long term, I agree with you. But as long as so many other parts
 of FOP (like Font.mapChar()) use char, there's no point to use int
 in the backend. There will never be any characters outside the basic
 plane until the whole process from input through layout engine to
 rendering components are prepared for these characters. In the short
 term, my change really improves understandability which is usually one
 of your major concerns. It helped me a lot identifying a problem. The
 changes can easily be reverted once there is a concerted effort to
 make the whole of FOP compatible with the full range of Unicode
 characters. It's also important to note that the AFP part will need some
 special attention when these characters need to be used as some of the
 data structures in there will get insanely large if we start supporting
 characters beyong the basic plane. So unless there is a sustained veto
 against rev 946585 I'm inclined to leave it like it is.
 
 On 21.05.2010 11:46:42 Vincent Hennebert wrote:
 Hi,

 Author: jeremias
 Date: Thu May 20 09:52:27 2010
 New Revision: 946585

 URL: http://svn.apache.org/viewvc?rev=946585view=rev
 Log:
 Changed many variables and parameters from int to char because AFP font 
 support mostly uses Unicode code points unlike Type 1 and TrueType support 
 which use internal character code points (the result of Font.mapChar()). 
 This should improve code readability.
 Not sure this is a desirable change. char can only address characters
 from the Basic Multilingual Plane. Java 1.5 have started to use int to
 overcome that issue actually. So unless there is a fundamental
 limitation in AFP such that characters beyond the BMP will never be
 usable, I think we want to stick to int.

 snip/

 Vincent
 
 
 
 
 Jeremias Maerki
 


Re: svn commit: r946585 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/afp/fonts: AFPFont.java AbstractOutlineFont.java CharacterSet.java CharacterSetBuilder.java CharacterSetOrientation.java Dou

2010-05-25 Thread Jeremias Maerki
Hi Vincent,

in the long term, I agree with you. But as long as so many other parts
of FOP (like Font.mapChar()) use char, there's no point to use int
in the backend. There will never be any characters outside the basic
plane until the whole process from input through layout engine to
rendering components are prepared for these characters. In the short
term, my change really improves understandability which is usually one
of your major concerns. It helped me a lot identifying a problem. The
changes can easily be reverted once there is a concerted effort to
make the whole of FOP compatible with the full range of Unicode
characters. It's also important to note that the AFP part will need some
special attention when these characters need to be used as some of the
data structures in there will get insanely large if we start supporting
characters beyong the basic plane. So unless there is a sustained veto
against rev 946585 I'm inclined to leave it like it is.

On 21.05.2010 11:46:42 Vincent Hennebert wrote:
 Hi,
 
  Author: jeremias
  Date: Thu May 20 09:52:27 2010
  New Revision: 946585
  
  URL: http://svn.apache.org/viewvc?rev=946585view=rev
  Log:
  Changed many variables and parameters from int to char because AFP font 
  support mostly uses Unicode code points unlike Type 1 and TrueType support 
  which use internal character code points (the result of Font.mapChar()). 
  This should improve code readability.
 
 Not sure this is a desirable change. char can only address characters
 from the Basic Multilingual Plane. Java 1.5 have started to use int to
 overcome that issue actually. So unless there is a fundamental
 limitation in AFP such that characters beyond the BMP will never be
 usable, I think we want to stick to int.
 
 snip/
 
 Vincent




Jeremias Maerki



Re: svn commit: r946585 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/afp/fonts: AFPFont.java AbstractOutlineFont.java CharacterSet.java CharacterSetBuilder.java CharacterSetOrientation.java Dou

2010-05-21 Thread Vincent Hennebert
Hi,

 Author: jeremias
 Date: Thu May 20 09:52:27 2010
 New Revision: 946585
 
 URL: http://svn.apache.org/viewvc?rev=946585view=rev
 Log:
 Changed many variables and parameters from int to char because AFP font 
 support mostly uses Unicode code points unlike Type 1 and TrueType support 
 which use internal character code points (the result of Font.mapChar()). This 
 should improve code readability.

Not sure this is a desirable change. char can only address characters
from the Basic Multilingual Plane. Java 1.5 have started to use int to
overcome that issue actually. So unless there is a fundamental
limitation in AFP such that characters beyond the BMP will never be
usable, I think we want to stick to int.

snip/

Vincent


Re: svn commit: r946585 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/afp/fonts: AFPFont.java AbstractOutlineFont.java CharacterSet.java CharacterSetBuilder.java CharacterSetOrientation.java Dou

2010-05-21 Thread Peter B. West
Sorry, I wasn't paying enough attention.

Yes, when dealing with individual character interfaces, you need to provide 
codepoint as well as char. The relationship between codepoints and strings is 
not straightforward, however.

Peter West
Lord, to whom shall we go?




On 22/05/2010, at 12:14 AM, Glenn Adams wrote:

 it's a simple problem, which can be stated as follows:
   • the char data type in Java does not denote a character, rather, it 
 denotes a UTF-16 encoding element
   • some Unicode characters, i.e., those in the BMP, are represented by 
 one char element (char[1]), while other Unicode characters require two char 
 elements (char[2]);
   • in order to make use of non-BMP characters, of which there are now 
 many standardized instances, one must either pass a char array, e.g., 
 char[2], or, alternatively pass an int, which is capable of representing all 
 Unicode code points in the range of 0 ... 0x10;
 at some point, FOP needs to support the effective use of characters outside 
 the BMP coding space, and, consequently, those FOP interfaces that use the 
 char type need to be upgraded to int;
 
 I am referring to FOP defined interfaces mind you, not Java defined 
 interfaces; in general, the Java interfaces provide mechanisms to address 
 this problem; for instance, see the discussion in the preamble of the 
 definition of java.lang.Character, the pertinent point of which I repeat 
 below:
   • The methods that only accept a char value cannot support 
 supplementary characters. They treat char values from the surrogate ranges as 
 undefined characters. For example, Character.isLetter('\uD840') returns 
 false, even though this specific value if followed by any low-surrogate value 
 in a string would represent a letter.
   • The methods that accept an int value support all Unicode characters, 
 including supplementary characters. For example, Character.isLetter(0x2F81A) 
 returns truebecause the code point value represents a letter (a CJK 
 ideograph).
 what I believe the original commenter is pointing out (and that I am agreeing 
 with) is that FOP needs to take care to not use the char type for interface 
 parameters that are intended to denote a Unicode character; or, if they do, 
 then an overloaded version of the same interface that uses the int type 
 should also be provided;
 
 for example, the following interfaces need to be upgraded to int or to have 
 an overloaded int variant:
 
 org.apache.fop.fonts.Font.getKernValue(char ch1, char ch2);
 org.apache.fop.fonts.Font.getWidth(char charnum);
 org.apache.fop.fonts.Font.mapChar(char c);
 org.apache.fop.fonts.Font.hasChar(char c);
 org.apache.fop.fo.CharIterator.replaceChar(char c);
 org.apache.fop.fo.flow.Character.getCharacter();
 org.apache.fop.util.CharUtilities.*;
 ...
 
 i have already upgraded al of the CharUtilities.* methods to use int instead 
 of char in my present work on complex script support, but there are a variety 
 of other internal interfaces as noted above that need to be upgraded as well. 
 if you like, I can fold this into my present work, or assign it a new bug 
 number (which may be the best for tracking purposes);
 
 regards,
 glenn
 
 
 On Fri, May 21, 2010 at 7:31 AM, Peter B. West li...@pbw.id.au wrote:
 I'm puzzled by this discussion. AFAIK, Java has rejected moving to 32 bits in 
 Java 5. Instead, they are supporting supplementary characters. There's a 
 discussion here: 
 http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
 
 Peter West
 Lord, to whom shall we go?
 
 
 
 
 On 21/05/2010, at 11:11 PM, Glenn Adams wrote:
 
  I concur with this change, and have already made some changes in this 
  direction in my work on adding complex script support.
 
  Please note that it is not quite so simple as merely changing from char to 
  int in some locations. It is also necessary to convert from UTF-16 to 
  UTF-32, i.e., to the full Unicode code point value, which can range from 
  0x00 through 0x10 (see Unicode 5.2, Section 3.3, Item D9). It is 
  probably not a good idea to make this conversion too early, but rather, to 
  defer it until certain well defined interface points, which need to be 
  documented as taking the full Unicode code point, and not merely a UTF-16 
  code element.
 
  On Fri, May 21, 2010 at 3:46 AM, Vincent Hennebert vhenneb...@gmail.com 
  wrote:
  Hi,
 
   Author: jeremias
   Date: Thu May 20 09:52:27 2010
   New Revision: 946585
  
   URL: http://svn.apache.org/viewvc?rev=946585view=rev
   Log:
   Changed many variables and parameters from int to char because AFP 
   font support mostly uses Unicode code points unlike Type 1 and TrueType 
   support which use internal character code points (the result of 
   Font.mapChar()). This should improve code readability.
 
  Not sure this is a desirable change. char can only address characters
  from the Basic Multilingual Plane. Java 1.5 have started to use int to
  overcome that issue actually. So