Re: PATCH for Arabic/Persian Text, and Bidi-override!

Kia Teymourian Mon, 16 Oct 2006 09:09:52 -0700

Hi all,

thanks for your answers!

Jeremias Maerki schrieb:

On 08.10.2006 23:01:13 Kia Teymourian wrote:

Hi all,


I am working on a patch for Arabic/Persian Text decoration and  
Implementation of Bidi Algorithm.

I have some questions about it and writing to you to ask for your 
assistance,
could you please answer my questions!

1. Can I use java 1.5 or should preferably use 1.4 or JDK 1.3.


Until further notice FOP must compile with JDK 1.3.
See: http://xmlgraphics.apache.org/fop/trunk/compiling.html

2. I have a long list of constants, Unicode Arabic Presentation forms. 
This list will be used, when the algorithm search for
the correct glyphs forms. I think for the best Performance I should 
define some static final arrays, and write them directly in the
class program codes. They are some defined Unicode constants, which I 
get from the Arabic Unicode definition.
It is also to put them extern in some Text files, read them from 
there.   Which form would be the best ?


Whatever is fastest. I think both approaches are fine. Have you looked
at ICU4J? Maybe it already provides the functionality you need. I
haven't checked. At any rate, we've considered the use of ICU4J before.

3. I should add one or two new classes, something like 
ArabicTextHandler.java
Can I add them to the package org.apache.fop.layoutmgr ? Where is the 
right package?


Probably org.apache.fop.layoutmgr.inline.

4. Can I use some free TTF Fonts in my test cases, they are licensed 
under GPL.
I am going to use Persian TTF fonts from  
http://www.farsiweb.ir/wiki/Persian_fonts
Is there any license problems?


Yes, the GPL is off-limits for software distributed by the ASF. :-(

See: http://www.apache.org/legal/3party.html

5. The writing-mode which is entered in a block or block-container 
statement has to be known by the TextLayoutManager but is not delivered 
correctly. TextLayoutManager.addAreas(...) is called from 
BlockContainerLayoutManager.getNextKnuthElements(...) and 
BlockManager.getNextKnuthElements(...).
Why? Could you please write me your comments about the implemented 
writing-mode!


The writing mode is communicated to child layout managers through the
LayoutContext. See for example:
BlockContainerLayoutManager.createLayoutContext().

Would you mind showing us a short description of your approach to
implementing this? We've had people before implementing in this
direction but it was often a hack and did not fit in the whole picture.
I just want to spare you a late disappointment. Thanks. Some
documentation will be extremely important for us, since the current
team knows just about nothing about Arabic text. For example, we will
not be able to determine if some output is correct or not.

Jeremias Maerki

My first post to the mailing list was in May 2006,
http://www.mail-archive.com/fop-dev@xmlgraphics.apache.org/msg04325.html
may be you mean this post.
I could find several implementations, but now I am looking for the best one.

The first and the most important step is to have a solution for finding the correct Arabic presentation forms or glyphs.
As you know when we write an Arabic word, Arabic characters are connected together and will have a
different presentation form or glyphs depending on their position in a word or as they have in isolated form.

We could have 4 different presentation forms for a single Arabic character:
Isolated, when the Character are alone
Initial, when the Character are at the Start position of the Character array of the Arabic Word
End, when the Character are at the End position of the Character array of the Arabic Word
Middle, when the Character are in the middle of a Character array.

Arabic Unicode Characters are between, arabicStart = 0x0600; and arabicEnd = 0x06FF;
and we send a word to the algorithm if they contain an Arabic Character.

For instance, we should find for the following word the correct Presentation form.

<fo:block font-size="14pt" font-family="Tahoma" line-height="14pt"
space-after.optimum="12pt"   writing-mode="rl-tb" >
   تیموریان
</fo:block>
This is the form so as they are saved in FO file.
We should find the correct Presentation and change this word to

<fo:block font-size="14pt" font-family="Tahoma" line-height="14pt"
space-after.optimum="12pt"   writing-mode="rl-tb" >
   ﺗﯿﻤﻮﺭﯾﺎﻥ
</fo:block>

finding the correct presentation form and reverse the Character array.
you can see the difference and test it with a font which contains glyphs for the whole Unicode range of Arabic presentation forms A and B.
You can easily find the characters on these links :
http://www.alanwood.net/unicode/arabic.html
http://www.alanwood.net/unicode/arabic_presentation_forms_b.html

I have the implementation for this step but am looking for the best performance and compiling with java 1.3

The second step is to implement the Bidi Algorithm and the writing-mode, rt-tb, lr-tb
and bidi-override.

Sebastian Weber had already some changes for the writing-mode="rl-tb"

http://www.anneundsebp.de/fop/fop.html

Please check the points 4 and 5 about the TextLayoutManger and getWMctm()

As you see we should reverse the character Array, and put the Arabic words from the right to left, but
the ARABIC-INDIC DIGITs should be written from the left side,
it means, we should not reverse these character arrays
0660-066D ARABIC-INDIC DIGITs
06F0-06F9 EXTENDED ARABIC-INDIC DIGITs

and then the correct implementation of :
http://www.w3.org/TR/2001/REC-xsl-20011015/slice5.html#section-N6720-Unicode-BIDI-Processing

I am working on this Patch and will send you my questions.
And I will also send you some documentations about the Arabic texts and Bidirectional algorithm,
or write them in Wiki.

regards
Kia Teymourian

Re: PATCH for Arabic/Persian Text, and Bidi-override!

Reply via email to