Hello,
for this creating this patch I solved already the first problem with joining the Arabic Characters.
I have some questions regarding this matter:

1. ---
I need to send the Text charArray to Bidirectional Algorithm for visually reordering of the Characters .
In class TextLayoutManager
org.apache.fop.layoutmgr.inline.TextLayoutManager
, I check the textArray and if the Array contains any Arabic characters between U+0600 to U+06FF, it will be sent to the bidirectional algorithm, afterwards to the character joining algorithm.

My question is, can I use the java.text.bidi class? And say that Arabic Bidi is just in JDK 1.4 supported. Or should FOP have its own Bidi class which is compatible with JDK 1.3?

2.---
I need to change the page numbering format to Arabic-Indic numbering, but FOP does not support it.
U+0660    ARABIC-INDIC DIGIT ZERO
U+0661    ARABIC-INDIC DIGIT ONE
U+0662    ARABIC-INDIC DIGIT TWO
U+0663    ARABIC-INDIC DIGIT THREE
...
U+0669    ARABIC-INDIC DIGIT NINE

Where is the page numbering calculated in FOP? I need to support this
<fo:page-sequence initial-page-number="&#x661;">
for PDF output?

Thanks,
Kia Teymourian


Kia Teymourian wrote:
Hi all,

thanks for your answers!

Jeremias Maerki schrieb:
On 08.10.2006 23:01:13 Kia Teymourian wrote:
Hi all,

I am working on a patch for Arabic/Persian Text decoration and Implementation of Bidi Algorithm.

I have some questions about it and writing to you to ask for your assistance,
could you please answer my questions!

1. Can I use java 1.5 or should preferably use 1.4 or JDK 1.3.

Until further notice FOP must compile with JDK 1.3.
See: http://xmlgraphics.apache.org/fop/trunk/compiling.html

2. I have a long list of constants, Unicode Arabic Presentation forms. This list will be used, when the algorithm search for the correct glyphs forms. I think for the best Performance I should define some static final arrays, and write them directly in the class program codes. They are some defined Unicode constants, which I get from the Arabic Unicode definition. It is also to put them extern in some Text files, read them from there. Which form would be the best ?

Whatever is fastest. I think both approaches are fine. Have you looked
at ICU4J? Maybe it already provides the functionality you need. I
haven't checked. At any rate, we've considered the use of ICU4J before.

3. I should add one or two new classes, something like ArabicTextHandler.java Can I add them to the package org.apache.fop.layoutmgr ? Where is the right package?

Probably org.apache.fop.layoutmgr.inline.

4. Can I use some free TTF Fonts in my test cases, they are licensed under GPL. I am going to use Persian TTF fonts from http://www.farsiweb.ir/wiki/Persian_fonts
Is there any license problems?

Yes, the GPL is off-limits for software distributed by the ASF. :-(

See: http://www.apache.org/legal/3party.html

5. The writing-mode which is entered in a block or block-container statement has to be known by the TextLayoutManager but is not delivered correctly. TextLayoutManager.addAreas(...) is called from BlockContainerLayoutManager.getNextKnuthElements(...) and BlockManager.getNextKnuthElements(...). Why? Could you please write me your comments about the implemented writing-mode!

The writing mode is communicated to child layout managers through the
LayoutContext. See for example:
BlockContainerLayoutManager.createLayoutContext().

Would you mind showing us a short description of your approach to
implementing this? We've had people before implementing in this
direction but it was often a hack and did not fit in the whole picture.
I just want to spare you a late disappointment. Thanks. Some
documentation will be extremely important for us, since the current
team knows just about nothing about Arabic text. For example, we will
not be able to determine if some output is correct or not.

Jeremias Maerki


My first post to the mailing list was in May 2006,
http://www.mail-archive.com/fop-dev@xmlgraphics.apache.org/msg04325.html
may be you mean this post.
I could find several implementations, but now I am looking for the best one.

The first and the most important step is to have a solution for finding the correct Arabic presentation forms or glyphs. As you know when we write an Arabic word, Arabic characters are connected together and will have a different presentation form or glyphs depending on their position in a word or as they have in isolated form.

We could have 4 different presentation forms for a single Arabic character:
Isolated, when the Character are alone
Initial, when the Character are at the Start position of the Character array of the Arabic Word End, when the Character are at the End position of the Character array of the Arabic Word
Middle,  when the Character are in the middle of a Character array.

Arabic Unicode Characters are between, arabicStart = 0x0600; and arabicEnd = 0x06FF;
and we send a word to the algorithm if they contain an Arabic Character.

For instance, we should find for the following word the correct Presentation form.

<fo:block font-size="14pt" font-family="Tahoma" line-height="14pt" space-after.optimum="12pt" writing-mode="rl-tb" >
   &#x62a;&#x6cc;&#x645;&#x648;&#x631;&#x6cc;&#x627;&#x646;
</fo:block>
This is the form so as they are saved in FO file.
We should find the correct Presentation and change this word to

<fo:block font-size="14pt" font-family="Tahoma" line-height="14pt" space-after.optimum="12pt" writing-mode="rl-tb" > &#xfe97;&#xfbff;&#xfee4;&#xfeee;&#xfead;&#xfbfe;&#xfe8e;&#xfee5; </fo:block>

finding the correct presentation form and reverse the Character array.
you can see the difference and test it with a font which contains glyphs for the whole Unicode range of Arabic presentation forms A and B.
You can easily find the characters on these links :
http://www.alanwood.net/unicode/arabic.html
http://www.alanwood.net/unicode/arabic_presentation_forms_b.html

I have the implementation for this step but am looking for the best performance and compiling with java 1.3

The second step is to implement the Bidi Algorithm and the writing-mode, rt-tb, lr-tb
and bidi-override.

Sebastian Weber had already some changes for the writing-mode="rl-tb"

http://www.anneundsebp.de/fop/fop.html

Please check the points 4 and 5 about the TextLayoutManger  and getWMctm()

As you see we should reverse the character Array, and put the Arabic words from the right to left, but
the ARABIC-INDIC DIGITs should be written from the left side,
it means, we should not reverse these character arrays
0660-066D  ARABIC-INDIC DIGITs
06F0-06F9  EXTENDED ARABIC-INDIC DIGITs

and then the correct implementation of :
http://www.w3.org/TR/2001/REC-xsl-20011015/slice5.html#section-N6720-Unicode-BIDI-Processing

I am working on this Patch and will send you my questions.
And I will also send you some documentations about the Arabic texts and Bidirectional algorithm,
or write them in Wiki.

regards
Kia Teymourian




Reply via email to