Robert Gurol created BATIK-1074:
-----------------------------------

             Summary: ArrayIndexOutOfBoundsException in ArabicTextHandler with 
Arabic diacritics 
                 Key: BATIK-1074
                 URL: https://issues.apache.org/jira/browse/BATIK-1074
             Project: Batik
          Issue Type: Bug
    Affects Versions: 1.7
            Reporter: Robert Gurol
            Priority: Minor


Trying out some Arabic characters, I got a ArrayIndexOutOfBoundsException in 
ArabicTextHandler when the text contained Arabic diacritics 

Here's a fix that works for my input: 
ArabicTextHandler.doubleCharRemappings is missing some array entries: 

<pre>
...
        null,                                          // 0x0629
                
                // those were missing! 
                null, // 0x062A
                null, // 0x062B
                null, // 0x062C
                null, // 0x062D
                null, // 0x062E
                null, // 0x062F

        null,                                          // 0x0630
...
</pre>

Some strings from my test SVG (I copied those from Wikipedia): 

...

<text ns0:align="left middle" xmlns:ns1="http://oryx-editor.org"; 
ns1:anchors="left" fill="#000000" xmlns:ns2="http://oryx-editor.org"; 
ns2:fittoelem="sid-c3179252-02f3-48bd-8363-31952f62def3textannotationrect" 
font-size="14" xmlns:ns3="http://oryx-editor.org"; ns3:fontSize="14" 
id="sid-c3179252-02f3-48bd-8363-31952f62def3text" letter-spacing="-0.01px" 
stroke="black" stroke-width="0pt" text-anchor="start" 
xmlns:ns4="http://oryx-editor.org"; ns4:textWidth="360.61" transform="rotate(0)" 
x="4" y="93.184">
                                                                        <tspan 
dy="-30" x="4" y="93.184">The Arabic script has numerous 
diacritics,<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-16" x="4" y="93.184">including i'jam 〈إِعْجَام〉 (i‘jām, 
consonant<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-2" x="4" y="93.184">pointing), and tashkil 〈تَشْكِيل〉 
(tashkīl,<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="12" x="4" y="93.184">supplementary diacritics). The latter include 
the<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="26" x="4" y="93.184">ḥarakāt 〈حَرَكَات〉 (vowel marks; 
singular:<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="40" x="4" y="93.184">ḥarakah 〈حَرَكَة〉).</tspan>
                                                                </text>

...
<text xmlns:ns0="http://oryx-editor.org"; ns0:align="center middle" 
fill="#000000" xmlns:ns1="http://oryx-editor.org"; 
ns1:fittoelem="sid-408ec19b-8a4b-43a4-8787-36de6d17dc68unvisibleBorder" 
font-size="14" xmlns:ns2="http://oryx-editor.org"; ns2:fontSize="14" 
id="sid-408ec19b-8a4b-43a4-8787-36de6d17dc68text_name" letter-spacing="-0.01px" 
stroke="black" stroke-width="0pt" text-anchor="middle" 
xmlns:ns3="http://oryx-editor.org"; ns3:textWidth="360.323" 
transform="rotate(0)" x="180.161" y="374.994">
                                                                        <tspan 
dy="-296" x="180.161" y="374.994">The ḥarakāt, which literally means 'motions', 
are<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-282" x="180.161" y="374.994">the short vowel marks.<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-268" x="180.161" y="374.994">* The fatḥah 〈فَتْحَة〉 is a small diagonal 
line<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-254" x="180.161" y="374.994">placed above a letter, and represents a short 
/a/.<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-240" x="180.161" y="374.994">The word fatḥah itself (فَتْحَة) means 
opening,<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-226" x="180.161" y="374.994">and refers to the opening of the mouth 
when<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-212" x="180.161" y="374.994">producing an /a/. Example with dāl 
(henceforth,<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-198" x="180.161" y="374.994">the base consonant in the following 
examples):<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-184" x="180.161" y="374.994">〈دَ〉 /da/.<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-170" x="180.161" y="374.994">* A similar diagonal line below a letter is 
called a<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-156" x="180.161" y="374.994">kasrah 〈كَسْرَة〉 and designates a short 
/i/.<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-142" x="180.161" y="374.994">Example: 〈دِ〉 /di/.<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-128" x="180.161" y="374.994">* The ḍammah 〈ضَمَّة〉 is a small 
curl-like<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-114" x="180.161" y="374.994">diacritic placed above a letter to represent 
a short<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-100" x="180.161" y="374.994">/u/. Example: 〈دُ〉 /du/.<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-86" x="180.161" y="374.994">* The maddah 〈مَدَّة〉 is a tilde-like 
diacritic<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-72" x="180.161" y="374.994">which can appear only on top of an alif 
and<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-58" x="180.161" y="374.994">indicates a glottal stop /ʔ/ followed by a 
long /aː/.<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-44" x="180.161" y="374.994">Example: 〈قُرْآن〉 /qurˈʔaːn/.<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-30" x="180.161" y="374.994">* The superscript (or dagger) alif 
〈أَلِف<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-16" x="180.161" y="374.994">خَنْجَرِيَّة〉 (alif khanjarīyah), is written 
as<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="-2" x="180.161" y="374.994">short vertical stroke on top of a consonant. 
It<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="12" x="180.161" y="374.994">indicates a long /aː/ sound where alif is 
normally<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="26" x="180.161" y="374.994">not written, e.g. 〈هٰذَا〉 (hādhā) or 
〈رَحْمٰن〉<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="40" x="180.161" y="374.994">(raḥmān).<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="54" x="180.161" y="374.994">* The waṣlah 〈وَصْلَة〉, alif waṣlah 
〈أَلِف<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="68" x="180.161" y="374.994">وَصْلَة〉 or hamzat waṣl 〈هَمْزَة 
وَصْل〉<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="82" x="180.161" y="374.994">looks like a small letter ṣād on top of an alif 
〈ٱ〉<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="96" x="180.161" y="374.994">* Sukun  Example: 〈دَدْ〉 dad.<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="110" x="180.161" y="374.994">* Tanwin The sign 〈ـً〉 is most 
commonly<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="124" x="180.161" y="374.994">written in combination with 〈ـًا〉 (alif), 
〈ةً〉<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="138" x="180.161" y="374.994">(tā’ marbūṭah) or stand-alone 〈ءً〉 
(hamzah).<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="152" x="180.161" y="374.994">* Shaddah  Example: 〈دّ〉 /dd/; 
madrasah<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="166" x="180.161" y="374.994">〈مَدْرَسَة〉 ('school') vs. 
mudarrisah<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="180" x="180.161" y="374.994">〈مُدَرِّسَة〉 ('teacher', 
female).<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="194" x="180.161" y="374.994">* The ijam 〈إِعْجَام〉 (i‘jām) are the 
pointing<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="208" x="180.161" y="374.994">diacritics that distinguish various consonants 
that<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="222" x="180.161" y="374.994">have the same form (rasm), such as 〈ـبـ〉 
/b/,<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="236" x="180.161" y="374.994">〈ـتـ〉 /t/, 〈ـثـ〉 /θ/, 〈ـنـ〉 /n/, and 〈ـيـ〉 
/j/.<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="250" x="180.161" y="374.994">Typically ijam are not considered diacritics 
but<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="264" x="180.161" y="374.994">part of the letter.<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="278" x="180.161" y="374.994">* Hamza (glottal stop 
semi-consonant)<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="292" x="180.161" y="374.994">Main article: Hamza<v:newlineChar/>
                                                                        </tspan>
                                                                        <tspan 
dy="306" x="180.161" y="374.994">ئ  ؤ  إ  أ</tspan>
                                                                </text>
...








--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: batik-dev-unsubscr...@xmlgraphics.apache.org
For additional commands, e-mail: batik-dev-h...@xmlgraphics.apache.org

Reply via email to