Re: Joining Arabic Letters
Hello, When I type unicode 0644 then unicode 064E then unicode 0627 I obtain لَا on my web pages That is (Ligature LAM-ALIF) plus (ALIF) That's bad. What should I do to avoid this ? Thanks in advance
Re: Joining Arabic Letters
Escape Landsome wrote: When I type unicode 0644 then unicode 064E then unicode 0627 I obtain لَا on my web pages That is (Ligature LAM-ALIF) plus (ALIF) On my system this looks like (Ligature LAM-ALIF) plus (FATHA), which is what one might expect. This is running BabelPad 6.0 on Windows 7, with Uniscribe 1.0626.7601.17561, using Arial as the Arabic font. See ligature rule L1 on page 250 of TUS 6.0: L1 Transparent characters do not affect the ligating behavior of base (nontransparent) characters. For example: ALEFr + FATHAn + LAMl → (LAM-ALEF)n + FATHAn That's bad. What should I do to avoid this ? Thanks in advance In general, you cannot expect to get an answer to a question like Why doesn't this sequence display correctly on my browser? without providing at a minimum: - the operating system, including version - the browser, including version - the font -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
Re: Joining Arabic Letters
- the operating system, including version Linux version 3.0.0-15-generic-pae (buildd@zirconium) (gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) ) #26-Ubuntu SMP Fri Jan 20 17:07:31 UTC 2012 - the browser, including version Mozilla Firefox 9.0.1 - the font body:text { font-family:monospace; color:#A64; } .quran-verse { font-family:Arial,Helvetica,sans-serif; } (sorry for the mis-posting)
Re: Joining Arabic Letters
On Sat, Apr 07, 2012 at 08:50:18PM +0200, Escape Landsome wrote: - the browser, including version Mozilla Firefox 9.0.1 There was a bug in Firefox 9 causing the behaviour you described, it have been fixed in Firefox 10: https://bugzilla.mozilla.org/show_bug.cgi?id=714067 Regards, Khaled
Re: Joining Arabic Letters
On 31/03/2012, Philippe Verdy verd...@wanadoo.fr wrote: This means that even if there's a font change between two letters (for example due to a fallback for some letters or diacritics), each letter should contonue to adopt its normative joining behavior (i.e. displaying their correct joining form). Using OpenType or something similar there are several; ways you can implement an Arabic script font including several different ways you can write the lookup tables - all of which are valid. The same goes for any other complex script. Unless you are going to define some rigid way Arabic fonts are implemented - and a fixed glyph set - there is just no practical way to get font lookups to work across font change boundaries. Even then it would require some protocol allowing the lookups in each font to interact.
Re: Joining Arabic Letters
Le 1 avril 2012 19:24, Christopher Fynn chris.f...@gmail.com a écrit : Even then it would require some protocol allowing the lookups in each font to interact. There's smart mechanism indicated in this list, used by OpenOffice, that uses ZWJ for this purpose. I think it is a **great **suggestion that should be documented because it is full part of the standard (ZWJ and ZWNJ **have** been assigned standard joining types). It will work at least to correctly respect the standardized joining types for **all** base letters (not just in the Arabic script, but in other Semitic scripts that uses joining types as well). It will of course not work for supporting the correct positioning (or ligation) of diacritics, for which a renderer may just be able to use a default position, which may not work very well, but that will be still correct according to the standard. It will not work for diacritics that have not been encoded separately in the Arabic script (such as 1/2/3/4 dots above or below, pointing upwards or downwards, horizontally or vertically, and Persian-Urdu digits above...), only to avoid the non-normalization issues with letters that are not decomposable in the basic Arabic abjad (but still logically decomposable in some *alphabets* or abjads using the Arabic script). -- Philippe.
Re: Joining Arabic Letters
On 3/30/2012 5:36 PM, Philippe Verdy wrote: Le 30 mars 2012 20:08, Julian Bradfieldjcb+unic...@inf.ed.ac.uk a écrit : On 2012-03-30, Andreas Prilopprilop4...@trashmail.net wrote: I think a better idea is to have joining glyphs always even for different typefaces. At least the Unicode Standard should say what should happen when Arabic characters of different typefaces follow each other. How can it? Unicode is about plain text. As soon as you start talking about different typefaces, you're out of scope. Not really. Even if there is only one typeface involved, the joining behavior of Arabic letters is normative and in scope. The discussion was about joining about typeface boundaries, which is nonsense, of course. In order to make characters join, the glyphs for each have to be designed to allow such joining. In cases where the join results in a ligature, it's patently obvious that you can't have a typeface boundary in the middle of a ligature Now there's always something that renderers could do to provide fall-back solutions. For example, they could see whether one or the other typeface has the full ligature and arbitrarily move the boundaries of the typeface runs. For a mandatory ligature like lam-alif that might almost be reasonable. (Just as fallback rendering of diacritics is somewhat reasonable). However, I rather have layout engines that work really well in sensible cases, then tryiing to cover weird situations (ransom notes). that don't (or shouldn't) occur in practice. That said, some aspects of script rendering are of course in scope for the Unicode Standard. The natural scope for Unicode derives from character identity. Characters are encoded to represent certain entities in text. For characters that are members of scripts this means that there is an understood relation between character sequences and words (or fragments of words) in a given writing system that is supported by that script. If the lam alif ligature is mandatory, that tells the user that the character sequence for this is expected to be lam, alif with no joiner character between the two characters, nor the use of any dedicated character code for the ligature. The same goes for general joining behavior - for Arabic the default is described in the Standard, so that users know when to add ZWJ or ZWNJ for override. And so on... However, it's out of scope for Unicode to mandate anything about how to treat defective font bindings - Julian got that right. A./
Re: Joining Arabic Letters
I was not speaking about ligatures like lan+alef. But really about the contextual forms chosen from base letters (and independantly of the diacritics applied to them, except for a few of them that use different shapes in some combinations for these contextual joining forms and that are encoded distinctly in the UCS to allow exactly a difference of these contextual shapes in some joining contexts). I have never said that the glyphs was mandatory. But the joining behavior of each letter (independantly of whever ligatures are applied on top of them) must be kept. So in a combination like LAM, diacritic, ALEF, the joining behavior of each letter must be kept, even if there's a mapping to a single glyph for LAM, diacritic, that has itself no ligature bound with the following ALEF. In that case it is perfectly acceptable to use a font for LAM+diacritic and another for ALEF. The absence of the ligature in the first font will have no impact on the readability of the text because the ligature is only recommended but not mandatory for the script. I just want to say that the encoding of a separate diacritic between base letters that would otherwise join cleanly if using only one font should not prevent each font to use the correct contextual form when two fonts are used for each letter, even if these joins may not look very cleanly connected. Using the non-joining letter forms at font boundaries is not acceptable for Arabic. Le 31 mars 2012 07:52, Asmus Freytag asm...@ix.netcom.com a écrit : On 3/30/2012 5:36 PM, Philippe Verdy wrote: Le 30 mars 2012 20:08, Julian Bradfieldjcb+unic...@inf.ed.ac.uk a écrit : On 2012-03-30, Andreas Prilopprilop4...@trashmail.net wrote: I think a better idea is to have joining glyphs always even for different typefaces. At least the Unicode Standard should say what should happen when Arabic characters of different typefaces follow each other. How can it? Unicode is about plain text. As soon as you start talking about different typefaces, you're out of scope. Not really. Even if there is only one typeface involved, the joining behavior of Arabic letters is normative and in scope. The discussion was about joining about typeface boundaries, which is nonsense, of course. In order to make characters join, the glyphs for each have to be designed to allow such joining. In cases where the join results in a ligature, it's patently obvious that you can't have a typeface boundary in the middle of a ligature Now there's always something that renderers could do to provide fall-back solutions. For example, they could see whether one or the other typeface has the full ligature and arbitrarily move the boundaries of the typeface runs. For a mandatory ligature like lam-alif that might almost be reasonable. (Just as fallback rendering of diacritics is somewhat reasonable). However, I rather have layout engines that work really well in sensible cases, then tryiing to cover weird situations (ransom notes). that don't (or shouldn't) occur in practice. That said, some aspects of script rendering are of course in scope for the Unicode Standard. The natural scope for Unicode derives from character identity. Characters are encoded to represent certain entities in text. For characters that are members of scripts this means that there is an understood relation between character sequences and words (or fragments of words) in a given writing system that is supported by that script. If the lam alif ligature is mandatory, that tells the user that the character sequence for this is expected to be lam, alif with no joiner character between the two characters, nor the use of any dedicated character code for the ligature. The same goes for general joining behavior - for Arabic the default is described in the Standard, so that users know when to add ZWJ or ZWNJ for override. And so on... However, it's out of scope for Unicode to mandate anything about how to treat defective font bindings - Julian got that right. A./
Re: Joining Arabic Letters
A test table for all Arabic characters that have defined joining types (and most characters that are not joining) can be seen on this page: http://en.wikipedia.org/wiki/Template:Arabic_alphabet_shapes/joining This table is sorted by joining type, then by joining group. You'll note that some characters that are normatively dual-joining do not exhibit sometimes the mandatory joining with many fonts, notably for characters that have been added more recently. What is more strange is that the same fonts exhibit the left-joining not the right joining, even though they are normatively dual joining (you can ignore the letters that are not supported and are just displayed as squares, and for which you'll see just a small non connecting tatweel on either sides). For now I've not seen any existing Arabic font that exhibit the correct normative joining behavior for these letters such as U+063D (the Farsi Yeh with an inverted v above, which is dual-joining like the Farsi Yeh at U+06CC without the inverted v above, and in the same joining group; those fonts only map a single non-joining glyph for U+063D, but behave correctly for U+06CC). This is true even for all Arabic fonts shipped with Windows 7. Note: this page is a test page, and there may remain some errors, but the expected joinings are based directly on the normative joining types and joining groups defined in Unicode. My comment was then relevant, even in the case of just one font being used. Le 31 mars 2012 08:32, Philippe Verdy verd...@wanadoo.fr a écrit : I was not speaking about ligatures like lan+alef. But really about the contextual forms chosen from base letters (and independantly of the diacritics applied to them, except for a few of them that use different shapes in some combinations for these contextual joining forms and that are encoded distinctly in the UCS to allow exactly a difference of these contextual shapes in some joining contexts). I have never said that the glyphs was mandatory. But the joining behavior of each letter (independantly of whever ligatures are applied on top of them) must be kept. So in a combination like LAM, diacritic, ALEF, the joining behavior of each letter must be kept, even if there's a mapping to a single glyph for LAM, diacritic, that has itself no ligature bound with the following ALEF. In that case it is perfectly acceptable to use a font for LAM+diacritic and another for ALEF. The absence of the ligature in the first font will have no impact on the readability of the text because the ligature is only recommended but not mandatory for the script. I just want to say that the encoding of a separate diacritic between base letters that would otherwise join cleanly if using only one font should not prevent each font to use the correct contextual form when two fonts are used for each letter, even if these joins may not look very cleanly connected. Using the non-joining letter forms at font boundaries is not acceptable for Arabic. Le 31 mars 2012 07:52, Asmus Freytag asm...@ix.netcom.com a écrit : On 3/30/2012 5:36 PM, Philippe Verdy wrote: Le 30 mars 2012 20:08, Julian Bradfieldjcb+unic...@inf.ed.ac.uk a écrit : On 2012-03-30, Andreas Prilopprilop4...@trashmail.net wrote: I think a better idea is to have joining glyphs always even for different typefaces. At least the Unicode Standard should say what should happen when Arabic characters of different typefaces follow each other. How can it? Unicode is about plain text. As soon as you start talking about different typefaces, you're out of scope. Not really. Even if there is only one typeface involved, the joining behavior of Arabic letters is normative and in scope. The discussion was about joining about typeface boundaries, which is nonsense, of course. In order to make characters join, the glyphs for each have to be designed to allow such joining. In cases where the join results in a ligature, it's patently obvious that you can't have a typeface boundary in the middle of a ligature Now there's always something that renderers could do to provide fall-back solutions. For example, they could see whether one or the other typeface has the full ligature and arbitrarily move the boundaries of the typeface runs. For a mandatory ligature like lam-alif that might almost be reasonable. (Just as fallback rendering of diacritics is somewhat reasonable). However, I rather have layout engines that work really well in sensible cases, then tryiing to cover weird situations (ransom notes). that don't (or shouldn't) occur in practice. That said, some aspects of script rendering are of course in scope for the Unicode Standard. The natural scope for Unicode derives from character identity. Characters are encoded to represent certain entities in text. For characters that are members of scripts this means that there is an understood relation between character sequences and words (or fragments of
Re: Joining Arabic Letters
On Sat, Mar 31, 2012 at 08:55:28AM +0200, Philippe Verdy wrote: For now I've not seen any existing Arabic font that exhibit the correct normative joining behavior for these letters such as U+063D (the Farsi Yeh with an inverted v above, which is dual-joining like the Farsi Yeh at U+06CC without the inverted v above, and in the same joining group; those fonts only map a single non-joining glyph for U+063D, but behave correctly for U+06CC). This is true even for all Arabic fonts shipped with Windows 7. Check my free Amiri font (http://amirifont.org), it has full Unicode 6.0 Arabic coverage, with 6.1 additions under the way. But if you are using a layout engine that predates the addition of that character into Unicode, even a good font will not help here since the engine will be using the older Unicode character database where the joining behaviour of this letter is undefined. Regards, Khaled
Re: Joining Arabic Letters
On Fri, Mar 30, 2012 at 07:37:53PM +0200, Andreas Prilop wrote: I come back to http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/thread.html#11 A similar problem of showing non-joining, isolated Arabic glyphs can be seen in the attached file. Both Internet Explorer 8 and MS Word 2010 display isolated glyphs in some cases. I think a better idea is to have joining glyphs always even for different typefaces. At least the Unicode Standard should say what should happen when Arabic characters of different typefaces follow each other. OpenOffice/LibreOffice work around this by conditionally inserting ZWJ when there is a font switch in the middle of the word and joining is desired. Regards, Khaled
Re: Joining Arabic Letters
I am testing it in the latest version of chrome, which was release long after the latest Unicode addition to the Arabic letters (notably the last update of Arabic joining types in the UCD). So may be it's the internal engine used in Chrome that still does not support these mandatory joining types. But then, it would consider by default those characters as **non-joining** (because this is explicitly the default value of the joining type for all characters in the UCD that have not been assigned joining types). This is not the case, the implementation considers these characters as right-joining, so this is is clearly an implementation bug. Le 31 mars 2012 10:37, Khaled Hosny khaledho...@eglug.org a écrit : On Sat, Mar 31, 2012 at 08:55:28AM +0200, Philippe Verdy wrote: For now I've not seen any existing Arabic font that exhibit the correct normative joining behavior for these letters such as U+063D (the Farsi Yeh with an inverted v above, which is dual-joining like the Farsi Yeh at U+06CC without the inverted v above, and in the same joining group; those fonts only map a single non-joining glyph for U+063D, but behave correctly for U+06CC). This is true even for all Arabic fonts shipped with Windows 7. Check my free Amiri font (http://amirifont.org), it has full Unicode 6.0 Arabic coverage, with 6.1 additions under the way. But if you are using a layout engine that predates the addition of that character into Unicode, even a good font will not help here since the engine will be using the older Unicode character database where the joining behaviour of this letter is undefined. Regards, Khaled
Re: Joining Arabic Letters
This is smart... provided that fonts also map the ZWJ (not all Arabic fonts map it, they often map only ZWNJ to disable joinings, assuming that there's no reason to force the joining in normal texts; some Arabic fonts do not even map ZWNJ as well). Some Arabic fonts do not even map the joining types internally but depend on the engine to find the contextual forms by trying with the compatibility characters (so they are not suitable for anything else than basic Arabic). Le 31 mars 2012 10:39, Khaled Hosny khaledho...@eglug.org a écrit : On Fri, Mar 30, 2012 at 07:37:53PM +0200, Andreas Prilop wrote: I come back to http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/thread.html#11 A similar problem of showing non-joining, isolated Arabic glyphs can be seen in the attached file. Both Internet Explorer 8 and MS Word 2010 display isolated glyphs in some cases. I think a better idea is to have joining glyphs always even for different typefaces. At least the Unicode Standard should say what should happen when Arabic characters of different typefaces follow each other. OpenOffice/LibreOffice work around this by conditionally inserting ZWJ when there is a font switch in the middle of the word and joining is desired.
Joining Arabic Letters
I come back to http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/thread.html#11 A similar problem of showing non-joining, isolated Arabic glyphs can be seen in the attached file. Both Internet Explorer 8 and MS Word 2010 display isolated glyphs in some cases. I think a better idea is to have joining glyphs always even for different typefaces. At least the Unicode Standard should say what should happen when Arabic characters of different typefaces follow each other.Title: Joining Arabic Letters
Re: Joining Arabic Letters
On 2012-03-30, Andreas Prilop prilop4...@trashmail.net wrote: I think a better idea is to have joining glyphs always even for different typefaces. At least the Unicode Standard should say what should happen when Arabic characters of different typefaces follow each other. How can it? Unicode is about plain text. As soon as you start talking about different typefaces, you're out of scope. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: Joining Arabic Letters
Not really. Even if there is only one typeface involved, the joining behavior of Arabic letters is normative and in scope. This means that even if there's a font change between two letters (for example due to a fallback for some letters or diacritics), each letter should contonue to adopt its normative joining behavior (i.e. displaying their correct joining form). Then the renderer will just make a best effort to place the diacritics on them (even if those diacritics comes from another font than the base letter), but of course the ligatures of letters will not be generated, and it's possible that two letters that are normally joining perfectly will not join completely their joining strokes, even if each letter is shown in their correct form. If one wanted to disable the normative joining forms of letters, as ZWNJ can be used between them. I also think that the renderer should also be able to use base letters and diacritics found in a font by decomposing advanced characters that are encoded in the UCS with a single code point, if ever that character is not mapped in the font, using a best effort to place the diacritics, instead of trying to fond a fallback font that would map the composite character. Le 30 mars 2012 20:08, Julian Bradfield jcb+unic...@inf.ed.ac.uk a écrit : On 2012-03-30, Andreas Prilop prilop4...@trashmail.net wrote: I think a better idea is to have joining glyphs always even for different typefaces. At least the Unicode Standard should say what should happen when Arabic characters of different typefaces follow each other. How can it? Unicode is about plain text. As soon as you start talking about different typefaces, you're out of scope.