Robert Gurol created BATIK-1074: ----------------------------------- Summary: ArrayIndexOutOfBoundsException in ArabicTextHandler with Arabic diacritics Key: BATIK-1074 URL: https://issues.apache.org/jira/browse/BATIK-1074 Project: Batik Issue Type: Bug Affects Versions: 1.7 Reporter: Robert Gurol Priority: Minor
Trying out some Arabic characters, I got a ArrayIndexOutOfBoundsException in ArabicTextHandler when the text contained Arabic diacritics Here's a fix that works for my input: ArabicTextHandler.doubleCharRemappings is missing some array entries: <pre> ... null, // 0x0629 // those were missing! null, // 0x062A null, // 0x062B null, // 0x062C null, // 0x062D null, // 0x062E null, // 0x062F null, // 0x0630 ... </pre> Some strings from my test SVG (I copied those from Wikipedia): ... <text ns0:align="left middle" xmlns:ns1="http://oryx-editor.org" ns1:anchors="left" fill="#000000" xmlns:ns2="http://oryx-editor.org" ns2:fittoelem="sid-c3179252-02f3-48bd-8363-31952f62def3textannotationrect" font-size="14" xmlns:ns3="http://oryx-editor.org" ns3:fontSize="14" id="sid-c3179252-02f3-48bd-8363-31952f62def3text" letter-spacing="-0.01px" stroke="black" stroke-width="0pt" text-anchor="start" xmlns:ns4="http://oryx-editor.org" ns4:textWidth="360.61" transform="rotate(0)" x="4" y="93.184"> <tspan dy="-30" x="4" y="93.184">The Arabic script has numerous diacritics,<v:newlineChar/> </tspan> <tspan dy="-16" x="4" y="93.184">including i'jam 〈إِعْجَام〉 (i‘jām, consonant<v:newlineChar/> </tspan> <tspan dy="-2" x="4" y="93.184">pointing), and tashkil 〈تَشْكِيل〉 (tashkīl,<v:newlineChar/> </tspan> <tspan dy="12" x="4" y="93.184">supplementary diacritics). The latter include the<v:newlineChar/> </tspan> <tspan dy="26" x="4" y="93.184">ḥarakāt 〈حَرَكَات〉 (vowel marks; singular:<v:newlineChar/> </tspan> <tspan dy="40" x="4" y="93.184">ḥarakah 〈حَرَكَة〉).</tspan> </text> ... <text xmlns:ns0="http://oryx-editor.org" ns0:align="center middle" fill="#000000" xmlns:ns1="http://oryx-editor.org" ns1:fittoelem="sid-408ec19b-8a4b-43a4-8787-36de6d17dc68unvisibleBorder" font-size="14" xmlns:ns2="http://oryx-editor.org" ns2:fontSize="14" id="sid-408ec19b-8a4b-43a4-8787-36de6d17dc68text_name" letter-spacing="-0.01px" stroke="black" stroke-width="0pt" text-anchor="middle" xmlns:ns3="http://oryx-editor.org" ns3:textWidth="360.323" transform="rotate(0)" x="180.161" y="374.994"> <tspan dy="-296" x="180.161" y="374.994">The ḥarakāt, which literally means 'motions', are<v:newlineChar/> </tspan> <tspan dy="-282" x="180.161" y="374.994">the short vowel marks.<v:newlineChar/> </tspan> <tspan dy="-268" x="180.161" y="374.994">* The fatḥah 〈فَتْحَة〉 is a small diagonal line<v:newlineChar/> </tspan> <tspan dy="-254" x="180.161" y="374.994">placed above a letter, and represents a short /a/.<v:newlineChar/> </tspan> <tspan dy="-240" x="180.161" y="374.994">The word fatḥah itself (فَتْحَة) means opening,<v:newlineChar/> </tspan> <tspan dy="-226" x="180.161" y="374.994">and refers to the opening of the mouth when<v:newlineChar/> </tspan> <tspan dy="-212" x="180.161" y="374.994">producing an /a/. Example with dāl (henceforth,<v:newlineChar/> </tspan> <tspan dy="-198" x="180.161" y="374.994">the base consonant in the following examples):<v:newlineChar/> </tspan> <tspan dy="-184" x="180.161" y="374.994">〈دَ〉 /da/.<v:newlineChar/> </tspan> <tspan dy="-170" x="180.161" y="374.994">* A similar diagonal line below a letter is called a<v:newlineChar/> </tspan> <tspan dy="-156" x="180.161" y="374.994">kasrah 〈كَسْرَة〉 and designates a short /i/.<v:newlineChar/> </tspan> <tspan dy="-142" x="180.161" y="374.994">Example: 〈دِ〉 /di/.<v:newlineChar/> </tspan> <tspan dy="-128" x="180.161" y="374.994">* The ḍammah 〈ضَمَّة〉 is a small curl-like<v:newlineChar/> </tspan> <tspan dy="-114" x="180.161" y="374.994">diacritic placed above a letter to represent a short<v:newlineChar/> </tspan> <tspan dy="-100" x="180.161" y="374.994">/u/. Example: 〈دُ〉 /du/.<v:newlineChar/> </tspan> <tspan dy="-86" x="180.161" y="374.994">* The maddah 〈مَدَّة〉 is a tilde-like diacritic<v:newlineChar/> </tspan> <tspan dy="-72" x="180.161" y="374.994">which can appear only on top of an alif and<v:newlineChar/> </tspan> <tspan dy="-58" x="180.161" y="374.994">indicates a glottal stop /ʔ/ followed by a long /aː/.<v:newlineChar/> </tspan> <tspan dy="-44" x="180.161" y="374.994">Example: 〈قُرْآن〉 /qurˈʔaːn/.<v:newlineChar/> </tspan> <tspan dy="-30" x="180.161" y="374.994">* The superscript (or dagger) alif 〈أَلِف<v:newlineChar/> </tspan> <tspan dy="-16" x="180.161" y="374.994">خَنْجَرِيَّة〉 (alif khanjarīyah), is written as<v:newlineChar/> </tspan> <tspan dy="-2" x="180.161" y="374.994">short vertical stroke on top of a consonant. It<v:newlineChar/> </tspan> <tspan dy="12" x="180.161" y="374.994">indicates a long /aː/ sound where alif is normally<v:newlineChar/> </tspan> <tspan dy="26" x="180.161" y="374.994">not written, e.g. 〈هٰذَا〉 (hādhā) or 〈رَحْمٰن〉<v:newlineChar/> </tspan> <tspan dy="40" x="180.161" y="374.994">(raḥmān).<v:newlineChar/> </tspan> <tspan dy="54" x="180.161" y="374.994">* The waṣlah 〈وَصْلَة〉, alif waṣlah 〈أَلِف<v:newlineChar/> </tspan> <tspan dy="68" x="180.161" y="374.994">وَصْلَة〉 or hamzat waṣl 〈هَمْزَة وَصْل〉<v:newlineChar/> </tspan> <tspan dy="82" x="180.161" y="374.994">looks like a small letter ṣād on top of an alif 〈ٱ〉<v:newlineChar/> </tspan> <tspan dy="96" x="180.161" y="374.994">* Sukun Example: 〈دَدْ〉 dad.<v:newlineChar/> </tspan> <tspan dy="110" x="180.161" y="374.994">* Tanwin The sign 〈ـً〉 is most commonly<v:newlineChar/> </tspan> <tspan dy="124" x="180.161" y="374.994">written in combination with 〈ـًا〉 (alif), 〈ةً〉<v:newlineChar/> </tspan> <tspan dy="138" x="180.161" y="374.994">(tā’ marbūṭah) or stand-alone 〈ءً〉 (hamzah).<v:newlineChar/> </tspan> <tspan dy="152" x="180.161" y="374.994">* Shaddah Example: 〈دّ〉 /dd/; madrasah<v:newlineChar/> </tspan> <tspan dy="166" x="180.161" y="374.994">〈مَدْرَسَة〉 ('school') vs. mudarrisah<v:newlineChar/> </tspan> <tspan dy="180" x="180.161" y="374.994">〈مُدَرِّسَة〉 ('teacher', female).<v:newlineChar/> </tspan> <tspan dy="194" x="180.161" y="374.994">* The ijam 〈إِعْجَام〉 (i‘jām) are the pointing<v:newlineChar/> </tspan> <tspan dy="208" x="180.161" y="374.994">diacritics that distinguish various consonants that<v:newlineChar/> </tspan> <tspan dy="222" x="180.161" y="374.994">have the same form (rasm), such as 〈ـبـ〉 /b/,<v:newlineChar/> </tspan> <tspan dy="236" x="180.161" y="374.994">〈ـتـ〉 /t/, 〈ـثـ〉 /θ/, 〈ـنـ〉 /n/, and 〈ـيـ〉 /j/.<v:newlineChar/> </tspan> <tspan dy="250" x="180.161" y="374.994">Typically ijam are not considered diacritics but<v:newlineChar/> </tspan> <tspan dy="264" x="180.161" y="374.994">part of the letter.<v:newlineChar/> </tspan> <tspan dy="278" x="180.161" y="374.994">* Hamza (glottal stop semi-consonant)<v:newlineChar/> </tspan> <tspan dy="292" x="180.161" y="374.994">Main article: Hamza<v:newlineChar/> </tspan> <tspan dy="306" x="180.161" y="374.994">ئ ؤ إ أ</tspan> </text> ... -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: batik-dev-unsubscr...@xmlgraphics.apache.org For additional commands, e-mail: batik-dev-h...@xmlgraphics.apache.org