Hello,
for this creating this patch I solved already the first problem with
joining the Arabic Characters.
I have some questions regarding this matter:
1. ---
I need to send the Text charArray to Bidirectional Algorithm for
visually reordering of the Characters .
In class TextLayoutManager
org.apache.fop.layoutmgr.inline.TextLayoutManager
, I check the textArray and if the Array contains any Arabic characters
between U+0600 to U+06FF,
it will be sent to the bidirectional algorithm, afterwards to the
character joining algorithm.
My question is, can I use the java.text.bidi class? And say that Arabic
Bidi is just in JDK 1.4
supported. Or should FOP have its own Bidi class which is compatible
with JDK 1.3?
2.---
I need to change the page numbering format to Arabic-Indic numbering,
but FOP does not support it.
U+0660 ARABIC-INDIC DIGIT ZERO
U+0661 ARABIC-INDIC DIGIT ONE
U+0662 ARABIC-INDIC DIGIT TWO
U+0663 ARABIC-INDIC DIGIT THREE
...
U+0669 ARABIC-INDIC DIGIT NINE
Where is the page numbering calculated in FOP? I need to support this
<fo:page-sequence initial-page-number="١">
for PDF output?
Thanks,
Kia Teymourian
Kia Teymourian wrote:
Hi all,
thanks for your answers!
Jeremias Maerki schrieb:
On 08.10.2006 23:01:13 Kia Teymourian wrote:
Hi all,
I am working on a patch for Arabic/Persian Text decoration and
Implementation of Bidi Algorithm.
I have some questions about it and writing to you to ask for your
assistance,
could you please answer my questions!
1. Can I use java 1.5 or should preferably use 1.4 or JDK 1.3.
Until further notice FOP must compile with JDK 1.3.
See: http://xmlgraphics.apache.org/fop/trunk/compiling.html
2. I have a long list of constants, Unicode Arabic Presentation forms.
This list will be used, when the algorithm search for
the correct glyphs forms. I think for the best Performance I should
define some static final arrays, and write them directly in the
class program codes. They are some defined Unicode constants, which I
get from the Arabic Unicode definition.
It is also to put them extern in some Text files, read them from
there. Which form would be the best ?
Whatever is fastest. I think both approaches are fine. Have you looked
at ICU4J? Maybe it already provides the functionality you need. I
haven't checked. At any rate, we've considered the use of ICU4J before.
3. I should add one or two new classes, something like
ArabicTextHandler.java
Can I add them to the package org.apache.fop.layoutmgr ? Where is the
right package?
Probably org.apache.fop.layoutmgr.inline.
4. Can I use some free TTF Fonts in my test cases, they are licensed
under GPL.
I am going to use Persian TTF fonts from
http://www.farsiweb.ir/wiki/Persian_fonts
Is there any license problems?
Yes, the GPL is off-limits for software distributed by the ASF. :-(
See: http://www.apache.org/legal/3party.html
5. The writing-mode which is entered in a block or block-container
statement has to be known by the TextLayoutManager but is not delivered
correctly. TextLayoutManager.addAreas(...) is called from
BlockContainerLayoutManager.getNextKnuthElements(...) and
BlockManager.getNextKnuthElements(...).
Why? Could you please write me your comments about the implemented
writing-mode!
The writing mode is communicated to child layout managers through the
LayoutContext. See for example:
BlockContainerLayoutManager.createLayoutContext().
Would you mind showing us a short description of your approach to
implementing this? We've had people before implementing in this
direction but it was often a hack and did not fit in the whole picture.
I just want to spare you a late disappointment. Thanks. Some
documentation will be extremely important for us, since the current
team knows just about nothing about Arabic text. For example, we will
not be able to determine if some output is correct or not.
Jeremias Maerki
My first post to the mailing list was in May 2006,
http://www.mail-archive.com/fop-dev@xmlgraphics.apache.org/msg04325.html
may be you mean this post.
I could find several implementations, but now I am looking for the
best one.
The first and the most important step is to have a solution for
finding the correct Arabic presentation forms or glyphs.
As you know when we write an Arabic word, Arabic characters are
connected together and will have a
different presentation form or glyphs depending on their position in a
word or as they have in isolated form.
We could have 4 different presentation forms for a single Arabic
character:
Isolated, when the Character are alone
Initial, when the Character are at the Start position of the Character
array of the Arabic Word
End, when the Character are at the End position of the Character array
of the Arabic Word
Middle, when the Character are in the middle of a Character array.
Arabic Unicode Characters are between, arabicStart = 0x0600; and
arabicEnd = 0x06FF;
and we send a word to the algorithm if they contain an Arabic Character.
For instance, we should find for the following word the correct
Presentation form.
<fo:block font-size="14pt" font-family="Tahoma" line-height="14pt"
space-after.optimum="12pt" writing-mode="rl-tb" >
تیموریان
</fo:block>
This is the form so as they are saved in FO file.
We should find the correct Presentation and change this word to
<fo:block font-size="14pt" font-family="Tahoma" line-height="14pt"
space-after.optimum="12pt" writing-mode="rl-tb" >
ﺗﯿﻤﻮﺭﯾﺎﻥ
</fo:block>
finding the correct presentation form and reverse the Character array.
you can see the difference and test it with a font which contains
glyphs for the whole Unicode range of Arabic presentation forms A and B.
You can easily find the characters on these links :
http://www.alanwood.net/unicode/arabic.html
http://www.alanwood.net/unicode/arabic_presentation_forms_b.html
I have the implementation for this step but am looking for the best
performance and compiling with java 1.3
The second step is to implement the Bidi Algorithm and the
writing-mode, rt-tb, lr-tb
and bidi-override.
Sebastian Weber had already some changes for the writing-mode="rl-tb"
http://www.anneundsebp.de/fop/fop.html
Please check the points 4 and 5 about the TextLayoutManger and getWMctm()
As you see we should reverse the character Array, and put the Arabic
words from the right to left, but
the ARABIC-INDIC DIGITs should be written from the left side,
it means, we should not reverse these character arrays
0660-066D ARABIC-INDIC DIGITs
06F0-06F9 EXTENDED ARABIC-INDIC DIGITs
and then the correct implementation of :
http://www.w3.org/TR/2001/REC-xsl-20011015/slice5.html#section-N6720-Unicode-BIDI-Processing
I am working on this Patch and will send you my questions.
And I will also send you some documentations about the Arabic texts
and Bidirectional algorithm,
or write them in Wiki.
regards
Kia Teymourian