https://issues.apache.org/bugzilla/show_bug.cgi?id=32789

--- Comment #8 from Jonathan Levinson <[email protected]> 2010-02-08 
06:58:15 UTC ---
Hi Vincent,

I will attach the .fo file I've been using for testing.  I will also attach the
generated pdf.  This is from an example our Dubai team gave me for my own
testing as I developed the code.

Our Dubai team has been testing with a large variety of Arabic script - but
they are using a report creation tool that invokes fop.bat with xsl input so
the .fo file isn't part of their output.

I could give them instructions for creating .fo files.

We have found in testing that what is most important is the BIDI algorithm is
applied so that text (including embedded numerals) is in the right order and
that form shaping is correct.  You need to know the Arabic alphabet and its
rules to assess the output of testing.  We have a team that knows Arabic to do
our testing.  They "eyeball" the reports to make sure they are in proper Arabic
with text and sub-text in the right order.  Embedded numerals can be in a
different order - left-to-right rather than right-to-left. It isn't clear to me
how this process can be automated.

You are right that widths change and this could change line breaking decisions.
 Do you know where in the FOP pipeline before we reach the rendering pipeline
the Arabic shaping could go so as to be able to affect width selection?

I believe that what ensures the right glyphs are embedded in the PDF file is
the nature of the ICU4J algorithm which transforms the UNICODE representation
of the string.  The output for our Dubai team is PDFs with embedded fonts and
these are working so ICU4J must have solved the problem in some way, and I
believe the way they solve it is by using different UNICODE codes.

I don't have performance numbers to give you yet.  If ICU4J was clever about
the way they wrote their transform algorithm it should not be much of a
performance impact since they only need to transform text in the Arabic UNICODE
code range and testing whether text is in this range should be quick.

Thanks,
Jonathan

(In reply to comment #7)
> Hi,
> Thanks for your patch. Do you have an example FO file that could be used for
> testing purpose (even better, with an English translation)?
> IIUC, Arabic shaping is about replacing glyphs for standalone letters with
> suitable ligature glyphs for building words. Surely that affects character
> widths, so line breaking decisions? In the patch, shaping is performed at the
> rendering stage, so isn't there a danger of getting inconsistent results?
> Also, IIC Arabic shaping affects glyphs selection. How do you make sure that
> the right glyphs are being embedded in the PDF file?
> The same piece of code is duplicated in the PCL and PDF painters. The same
> would probably also need to be done for other painters. This is not desirable.
> Finally, what is the impact on performance? It looks like shaping will be
> applied to just any text, even non-arabic one.
> Thanks,
> Vincent
> (In reply to comment #3)
> > Created an attachment (id=24934)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=24934) [details]
[details]
> > Support for Arabic PDF rendering using ICU4J
> > 
> > This patch uses ICU4J to do form-shaping and BIDI transformation of rendered
> > text.  It is a patch for the FOP trunk.   It does not change the layout 
> > manager
> > or the area tree handler or allow a writing-mode other than “lr-tb”.   For 
> > this
> > patch to be integrated with FOP, FOP would need to distribute the ICU4J 
> > library
> > - icu4j-4_2_1.jar.   It affects both PDF and PCL rendering but has only been
> > tested with PDF rendering.  So far results of testing with PDF rendering 
> > have
> > been positive.  The PCL aspect of the patch looks correct given that the PDF
> > aspect works.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to