On 19/04/15 10:51, Abdulhaq wrote:
MiOn Sunday, 19 April 2015 at 02:20:01 UTC, Shachar Shemesh wrote:
On 18/04/15 21:40, Walter Bright wrote:

Also, notice that some letters can only be achieved using multiple
code points. Hebrew diacritics, for example, do not, typically, have a
composite form. My name fully spelled (which you rarely would do),
שַׁחַר, cannot be represented with less than 6 code points, despite
having only three letters.


Yes Arabic is similar too


Actually, the Arab presentation forms serve a slightly different purpose. In Hebrew, the presentation forms are mostly for Bibilical text, where certain decorations are usually done.

For Arabic, the main reason for the presentation forms is shaping. Almost every Arabic letter can be written in up to four different forms (alone, start of word, middle of word and end of word). This means that Arabic has 28 letters, but over 100 different shapes for those letters. These days, when the font can do the shaping, the 28 letters suffice. During the DOS days, you needed to actually store those glyphs somewhere, which means that you needed to allocate a number to them.

In Hebrew, some letters also have a final form. Since the numbers are so significantly smaller, however, (22 letters, 5 of which have final forms), Hebrew keyboards actually have all 27 letters on them. Going strictly by the "Unicode way", one would be expected to spell שלום with U05DE as the last letter, and let the shaping engine figure out that it should use the final form (or add a ZWNJ). Since all Hebrew code charts contained a final form Mem, however, you actually spell it with U05DD in the end, and it is considered a distinct letter.

Shachar

Reply via email to