DO NOT REPLY [Bug 32789] [PATCH] Arabic Shaping not Supported by FOP

bugzilla Fri, 12 Feb 2010 03:15:39 -0800

https://issues.apache.org/bugzilla/show_bug.cgi?id=32789


--- Comment #13 from Vincent Hennebert <vhenneb...@gmail.com> 2010-02-12 
11:15:07 UTC ---
Hi Jonathan,

I'm lacking the knowledge to properly answer all of your questions, but I'll
try anyway.

(In reply to comment #12)
> Hi Vincent,
> 
> Before committing the work I did on Arabic to the trunk, the Apache FOP
> organization seems to want five things:
> 
> 1)    Modify ICU4J change to check if classes available and if not don't call
> them

I would leave that aside for now. This can be done in the last refinements,
once everything else is in place.


> 2)    Provide Apache organization with performance data to assess performance
> cost of Arabic Shaping classes
> 
> 3)    Provide Apache organization with better examples of use of Arabic
> 
> 4)    Move Arabic form shaping and BIDI algorithm to layout manager
> 
> 5)    Not use ICU4J to do UNICODE transformation but use the standard ligature
> mechanism to provide contextual glyphs.  This is a request for a complete
> rewrite of the patch to use a mechanism that isn't known to me currently, but
> maybe could become known if I had the right pointers.
> 
> (4) is highly non-trivial.  I haven’t a clue as to how to do (5).

I didn't say it was trivial :-)


> For (4) could you point me at the source code files in the layout manager that
> would have to be changed?  Can you give me some pointers as to where this sort
> of information is processed by the layout manager?  I've read the layout
> manager code and tried to locate where it processes the width of characters 
> and
> what would have to change to have right-to-left printing but I've been unable
> to penetrate the forest for the trees.  I have read Knuth's algorithm for line
> breaking and I think I have a good understanding of what a KnuthElement is -
> glue, penalties and the basics of Knuth's algorithm, but I'm having trouble
> converting this theoretical understanding into a practical understanding of
> what has to change in the code to move the printing from right to left.

I don't really know myself where to look either. Without talking about the FOP
code yet, it must be seen how to do character re-ordering, line breaking and
glyph shaping. The three processes probably have an impact on each other. Does
a glyph change whether it is at the end of a line or not? How does hyphenation
work (apparently, only applies to the Uighur script)? Also, contrary to Western
scripts, I think justification is not done by increasing inter-word spaces, but
by using wider alternative glyphs.
Obviously, the appropriate sections of the XSL-FO Recommendation need to be
studied, as well as the Unicode Standard (in particular, UAX #9 about the
Bidirectional Algorithm). And also other resources on the web.

You can use the Wiki to gather your thoughts:
http://wiki.apache.org/xmlgraphics-fop/DeveloperPages


> I’m not sure what to do about (5).  Do you have any references, is there some
> pointer to what algorithm would do more than UNICODE transformation but would
> do contextual glyphs based on the glyphs in a font.  How do I tell the
> characteristics of an Arabic character in  a font, whether it is in initial,
> intermediate or final position?  I suppose this information would vary from
> font to font.  Where in FOP is font information like this processed and how do
> I “tell” a font I want the Arabic character at UNICODE position X but I want
> from the font that the character be in final position?  Does the layout 
> manager
> actually process the font information about a character?  I suppose it must to
> know character widths, which are necessary for Knuth's algorithm, but please
> forgive me, I don't see where this code lives.  FOP has over 11,000 files!

I'm almost sure that the OpenType font format provides the necessary mechanisms
to do contextual glyph shaping. But I've never really looked into it. I guess
one mechanism or the other will have to be selected depending on which one the
font supports.


> I used ICU4J to avoid having to write a ton of code.  That is why my patch is
> so small.

Which is a good idea. We will probably need ICU4J anyway. But there /is/ going
to be a lot of code to write, simply because the whole issue is all but
trivial. Some heavy refactoring of the layout code will probably be needed,
too.


> I'm not complaining.  I'm hoping I can get some more pointers to what changes
> need to be made to support Arabic and where the changes have to go.  Even if
> I'm not the one who eventually does the work, whoever eventually implements
> right-to-left printing and Arabic support will certainly find our discussion
> valuable.  I'm sure you'll agree that FOP needs to become truly international
> at some point. That would really open a new community of users to the benefits
> of FOP, which are considerable.
> 
> In fact, I agree that it is hard to see how there can be a robust solution to
> the problem of printing Arabic text that simply involves the PDF renderer;
> theoretically and probably practically the layout manager has to be involved.
> 
> I’ve looked at the FOP SVG rendering code which tries to do Arabic form 
> shaping
> and it seems to be just doing UNICODE transformations.  It doesn’t seem to be
> responding to the ability of a font that you are discussing, to display a
> single UNICODE code in many different forms.  It seems to be just doing a
> simple table look up that transforms a UNICODE code.  So you already have code
> in FOP, in  your SVG renderer, that seems to do the same thing I tried to do
> using ICU4J.  This doesn't mean the code I wrote using ICU4J is doing the 
> right
> thing, but it does mean that simply transforming one UNICODE code to another 
> is
> the simplest first step in solving this difficult problem.
> 
> Could we agree that we could live with an ICU4J approach if (1),(2),(3), and
> (4) were met as conditions, and that (5) - a complete rewrite using modern 
> font
> techniques could be deferred.  Of course, I'm interested in learning how I
> could achieve (5); I'm not dismissing (5), I'm just looking for a bottom-line
> that would allow FOP to practically meet the needs of rendering Arabic text,
> even if the result isn't perfect yet.

(5) can surely be left aside for now, as long as the code structure allows it
to be implemented and plugged in later on, with a transparent switch from one
mechanism to the other depending on the font used.


> Best Regards,
> Jonathan

HTH,
Vincent

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 32789] [PATCH] Arabic Shaping not Supported by FOP

Reply via email to