Re: PATCH for Arabic/Persian Text, and Bidi-override!

Kia Teymourian Tue, 21 Nov 2006 18:21:31 -0800

Hello,

for this creating this patch I solved already the first problem withjoining the Arabic Characters.

I have some questions regarding this matter:


1. ---

I need to send the Text charArray to Bidirectional Algorithm forvisually reordering of the Characters .

In class TextLayoutManager
org.apache.fop.layoutmgr.inline.TextLayoutManager

, I check the textArray and if the Array contains any Arabic charactersbetween U+0600 to U+06FF,it will be sent to the bidirectional algorithm, afterwards to thecharacter joining algorithm.

My question is, can I use the java.text.bidi class? And say that ArabicBidi is just in JDK 1.4supported. Or should FOP have its own Bidi class which is compatiblewith JDK 1.3?


2.---

I need to change the page numbering format to Arabic-Indic numbering,but FOP does not support it.

U+0660    ARABIC-INDIC DIGIT ZERO
U+0661    ARABIC-INDIC DIGIT ONE
U+0662    ARABIC-INDIC DIGIT TWO
U+0663    ARABIC-INDIC DIGIT THREE
...
U+0669    ARABIC-INDIC DIGIT NINE

Where is the page numbering calculated in FOP? I need to support this
<fo:page-sequence initial-page-number="&#x661;">
for PDF output?

Thanks,
Kia Teymourian


Kia Teymourian wrote:

Hi all,

thanks for your answers!

Jeremias Maerki schrieb:
On 08.10.2006 23:01:13 Kia Teymourian wrote:
Hi all,
I am working on a patch for Arabic/Persian Text decoration andImplementation of Bidi Algorithm.
I have some questions about it and writing to you to ask for yourassistance,
could you please answer my questions!

1. Can I use java 1.5 or should preferably use 1.4 or JDK 1.3.
Until further notice FOP must compile with JDK 1.3.
See: http://xmlgraphics.apache.org/fop/trunk/compiling.html
2. I have a long list of constants, Unicode Arabic Presentation forms.This list will be used, when the algorithm search forthe correct glyphs forms. I think for the best Performance I shoulddefine some static final arrays, and write them directly in theclass program codes. They are some defined Unicode constants, which Iget from the Arabic Unicode definition.It is also to put them extern in some Text files, read them fromthere. Which form would be the best ?
Whatever is fastest. I think both approaches are fine. Have you looked
at ICU4J? Maybe it already provides the functionality you need. I
haven't checked. At any rate, we've considered the use of ICU4J before.
3. I should add one or two new classes, something likeArabicTextHandler.javaCan I add them to the package org.apache.fop.layoutmgr ? Where is theright package?
Probably org.apache.fop.layoutmgr.inline.
4. Can I use some free TTF Fonts in my test cases, they are licensedunder GPL.I am going to use Persian TTF fonts fromhttp://www.farsiweb.ir/wiki/Persian_fonts
Is there any license problems?
Yes, the GPL is off-limits for software distributed by the ASF. :-(

See: http://www.apache.org/legal/3party.html
5. The writing-mode which is entered in a block or block-containerstatement has to be known by the TextLayoutManager but is not deliveredcorrectly. TextLayoutManager.addAreas(...) is called fromBlockContainerLayoutManager.getNextKnuthElements(...) andBlockManager.getNextKnuthElements(...).Why? Could you please write me your comments about the implementedwriting-mode!
The writing mode is communicated to child layout managers through the
LayoutContext. See for example:
BlockContainerLayoutManager.createLayoutContext().

Would you mind showing us a short description of your approach to
implementing this? We've had people before implementing in this
direction but it was often a hack and did not fit in the whole picture.
I just want to spare you a late disappointment. Thanks. Some
documentation will be extremely important for us, since the current
team knows just about nothing about Arabic text. For example, we will
not be able to determine if some output is correct or not.

Jeremias Maerki
My first post to the mailing list was in May 2006,
http://www.mail-archive.com/fop-dev@xmlgraphics.apache.org/msg04325.html
may be you mean this post.
I could find several implementations, but now I am looking for thebest one.
The first and the most important step is to have a solution forfinding the correct Arabic presentation forms or glyphs.As you know when we write an Arabic word, Arabic characters areconnected together and will have adifferent presentation form or glyphs depending on their position in aword or as they have in isolated form.
We could have 4 different presentation forms for a single Arabiccharacter:
Isolated, when the Character are alone
Initial, when the Character are at the Start position of the Characterarray of the Arabic WordEnd, when the Character are at the End position of the Character arrayof the Arabic Word
Middle,  when the Character are in the middle of a Character array.
Arabic Unicode Characters are between, arabicStart = 0x0600; andarabicEnd = 0x06FF;
and we send a word to the algorithm if they contain an Arabic Character.
For instance, we should find for the following word the correctPresentation form.
<fo:block font-size="14pt" font-family="Tahoma" line-height="14pt"space-after.optimum="12pt" writing-mode="rl-tb" >
   &#x62a;&#x6cc;&#x645;&#x648;&#x631;&#x6cc;&#x627;&#x646;
</fo:block>
This is the form so as they are saved in FO file.
We should find the correct Presentation and change this word to
<fo:block font-size="14pt" font-family="Tahoma" line-height="14pt"space-after.optimum="12pt" writing-mode="rl-tb" >ﺗﯿﻤﻮﺭﯾﺎﻥ</fo:block>
finding the correct presentation form and reverse the Character array.
you can see the difference and test it with a font which containsglyphs for the whole Unicode range of Arabic presentation forms A and B.
You can easily find the characters on these links :
http://www.alanwood.net/unicode/arabic.html
http://www.alanwood.net/unicode/arabic_presentation_forms_b.html
I have the implementation for this step but am looking for the bestperformance and compiling with java 1.3
The second step is to implement the Bidi Algorithm and thewriting-mode, rt-tb, lr-tb
and bidi-override.

Sebastian Weber had already some changes for the writing-mode="rl-tb"

http://www.anneundsebp.de/fop/fop.html

Please check the points 4 and 5 about the TextLayoutManger  and getWMctm()
As you see we should reverse the character Array, and put the Arabicwords from the right to left, but
the ARABIC-INDIC DIGITs should be written from the left side,
it means, we should not reverse these character arrays
0660-066D  ARABIC-INDIC DIGITs
06F0-06F9  EXTENDED ARABIC-INDIC DIGITs

and then the correct implementation of :
http://www.w3.org/TR/2001/REC-xsl-20011015/slice5.html#section-N6720-Unicode-BIDI-Processing

I am working on this Patch and will send you my questions.
And I will also send you some documentations about the Arabic textsand Bidirectional algorithm,
or write them in Wiki.

regards
Kia Teymourian

Re: PATCH for Arabic/Persian Text, and Bidi-override!

Reply via email to