Re: Joining Arabic Letters

2012-04-07 Thread Escape Landsome
Hello,

When I type unicode 0644 then unicode 064E then unicode 0627
I obtain لَا
on my web pages

That is (Ligature LAM-ALIF) plus (ALIF)

That's bad.

What should I do to avoid this ?  Thanks in advance




Re: Joining Arabic Letters

2012-04-07 Thread Doug Ewell

Escape Landsome wrote:


When I type unicode 0644 then unicode 064E then unicode 0627
I obtain لَا
on my web pages

That is (Ligature LAM-ALIF) plus (ALIF)


On my system this looks like (Ligature LAM-ALIF) plus (FATHA), which is 
what one might expect. This is running BabelPad 6.0 on Windows 7, with 
Uniscribe 1.0626.7601.17561, using Arial as the Arabic font.


See ligature rule L1 on page 250 of TUS 6.0:

L1 Transparent characters do not affect the ligating behavior of base 
(nontransparent)

characters. For example:

ALEFr + FATHAn + LAMl → (LAM-ALEF)n + FATHAn


That's bad.

What should I do to avoid this ?  Thanks in advance


In general, you cannot expect to get an answer to a question like Why 
doesn't this sequence display correctly on my browser? without 
providing at a minimum:


- the operating system, including version
- the browser, including version
- the font

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­ 





Re: Joining Arabic Letters

2012-04-07 Thread Escape Landsome
 - the operating system, including version

Linux version 3.0.0-15-generic-pae (buildd@zirconium) (gcc version
4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) ) #26-Ubuntu SMP Fri Jan 20
17:07:31 UTC 2012

 - the browser, including version

Mozilla Firefox 9.0.1

 - the font

body:text { font-family:monospace; color:#A64; }
.quran-verse { font-family:Arial,Helvetica,sans-serif; }

(sorry for the mis-posting)



Re: Joining Arabic Letters

2012-04-07 Thread Khaled Hosny
On Sat, Apr 07, 2012 at 08:50:18PM +0200, Escape Landsome wrote:
  - the browser, including version
 
 Mozilla Firefox 9.0.1

There was a bug in Firefox 9 causing the behaviour you described, it
have been fixed in Firefox 10:
https://bugzilla.mozilla.org/show_bug.cgi?id=714067

Regards,
 Khaled



Re: Joining Arabic Letters

2012-04-01 Thread Christopher Fynn
On 31/03/2012, Philippe Verdy verd...@wanadoo.fr wrote:

 This means that even if there's a font change between two letters (for
 example due to a fallback for some letters or diacritics), each letter
 should contonue to adopt its normative joining behavior (i.e.
 displaying their correct joining form).

Using OpenType or something similar there are several; ways you can
implement an Arabic script font including several different ways you
can write the lookup tables - all of which are valid. The same goes
for any other complex script.

Unless you are going to define some rigid way Arabic fonts are
implemented - and a fixed glyph set - there is just no practical way
to get font lookups to work across font change boundaries. Even then
it would require some protocol  allowing the lookups in each font to
interact.



Re: Joining Arabic Letters

2012-04-01 Thread Philippe Verdy
Le 1 avril 2012 19:24, Christopher Fynn chris.f...@gmail.com a écrit :
 Even then
 it would require some protocol  allowing the lookups in each font to
 interact.

There's smart mechanism indicated in this list, used by OpenOffice,
that uses ZWJ for this purpose.

I think it is a **great **suggestion that should be documented because
it is full part of the standard (ZWJ and ZWNJ **have** been assigned
standard joining types). It will work at least to correctly respect
the standardized joining types for **all** base letters (not just in
the Arabic script, but in other Semitic scripts that uses joining
types as well).

It will of course not work for supporting the correct positioning (or
ligation) of diacritics, for which a renderer may just be able to use
a default position, which may not work very well, but that will be
still correct according to the standard.

It will not work for diacritics that have not been encoded separately
in the Arabic script (such as 1/2/3/4 dots above or below, pointing
upwards or downwards, horizontally or vertically, and Persian-Urdu
digits above...), only to avoid the non-normalization issues with
letters that are not decomposable in the basic Arabic abjad (but still
logically decomposable in some *alphabets* or abjads using the Arabic
script).

-- Philippe.




Re: Joining Arabic Letters

2012-03-31 Thread Asmus Freytag

On 3/30/2012 5:36 PM, Philippe Verdy wrote:

Le 30 mars 2012 20:08, Julian Bradfieldjcb+unic...@inf.ed.ac.uk  a écrit :

On 2012-03-30, Andreas Prilopprilop4...@trashmail.net  wrote:

I think a better idea is to have joining glyphs always even for
different typefaces. At least the Unicode Standard should say
what should happen when Arabic characters of different typefaces
follow each other.

How can it? Unicode is about plain text. As soon as you start talking
about different typefaces, you're out of scope.

Not really. Even if there is only one typeface involved, the joining
behavior of Arabic letters is normative and in scope.




The discussion was about joining about typeface boundaries, which is 
nonsense, of course.


In order to make characters join, the glyphs for each have to be 
designed to allow
such joining. In cases where the join results in a ligature, it's 
patently obvious that you

can't have a typeface boundary in the middle of a ligature

Now there's always something that renderers could do to provide 
fall-back solutions.
For example, they could see whether one or the other typeface has the 
full ligature
and arbitrarily move the boundaries of the typeface runs. For a 
mandatory ligature
like lam-alif that might almost be reasonable. (Just as fallback 
rendering of diacritics

is somewhat reasonable).

However, I rather have layout engines that work really well in sensible 
cases, then tryiing
to cover weird situations (ransom notes). that don't (or shouldn't) 
occur in practice.


That said, some aspects of script rendering are of course in scope for 
the Unicode Standard.


The natural scope for Unicode derives from character identity.

Characters are encoded to represent certain entities in text. For 
characters that are
members of scripts this means that there is an understood relation 
between character
sequences and words (or fragments of words) in a given writing system 
that is supported

by that script.

If the lam alif ligature is mandatory, that tells the user that the 
character sequence for
this is expected to be lam, alif with no joiner character between the 
two characters,

nor the use of any dedicated character code for the ligature.

The same goes for general joining behavior - for Arabic the default is 
described in

the Standard, so that users know when to add ZWJ or ZWNJ for override.

And so on...

However, it's out of scope for Unicode to mandate anything about how to 
treat defective

font bindings - Julian got that right.

A./



Re: Joining Arabic Letters

2012-03-31 Thread Philippe Verdy
I was not speaking about ligatures like lan+alef. But really about the
contextual forms chosen from base letters (and independantly of the
diacritics applied to them, except for a few of them that use
different shapes in some combinations for these contextual joining
forms and that are encoded distinctly in the UCS to allow exactly a
difference of these contextual shapes in some joining contexts).

I have never said that the glyphs was mandatory. But the joining
behavior of each letter (independantly of whever ligatures are applied
on top of them) must be kept. So in a combination like LAM,
diacritic, ALEF, the joining behavior of each letter must be kept,
even if there's a mapping to a single glyph for LAM, diacritic, that
has itself no ligature bound with the following ALEF. In that case it
is perfectly acceptable to use a font for LAM+diacritic and another
for ALEF. The absence of the ligature in the first font will have no
impact on the readability of the text because the ligature is only
recommended but not mandatory for the script.

I just want to say that the encoding of a separate diacritic between
base letters that would otherwise join cleanly if using only one font
should not prevent each font to use the correct contextual form when
two fonts are used for each letter, even if these joins may not look
very cleanly connected. Using the non-joining letter forms at font
boundaries is not acceptable for Arabic.

Le 31 mars 2012 07:52, Asmus Freytag asm...@ix.netcom.com a écrit :
 On 3/30/2012 5:36 PM, Philippe Verdy wrote:

 Le 30 mars 2012 20:08, Julian Bradfieldjcb+unic...@inf.ed.ac.uk  a écrit
 :

 On 2012-03-30, Andreas Prilopprilop4...@trashmail.net  wrote:

 I think a better idea is to have joining glyphs always even for
 different typefaces. At least the Unicode Standard should say
 what should happen when Arabic characters of different typefaces
 follow each other.

 How can it? Unicode is about plain text. As soon as you start talking
 about different typefaces, you're out of scope.

 Not really. Even if there is only one typeface involved, the joining

 behavior of Arabic letters is normative and in scope.



 The discussion was about joining about typeface boundaries, which is
 nonsense, of course.

 In order to make characters join, the glyphs for each have to be designed
 to allow
 such joining. In cases where the join results in a ligature, it's patently
 obvious that you
 can't have a typeface boundary in the middle of a ligature

 Now there's always something that renderers could do to provide fall-back
 solutions.
 For example, they could see whether one or the other typeface has the full
 ligature
 and arbitrarily move the boundaries of the typeface runs. For a mandatory
 ligature
 like lam-alif that might almost be reasonable. (Just as fallback rendering
 of diacritics
 is somewhat reasonable).

 However, I rather have layout engines that work really well in sensible
 cases, then tryiing
 to cover weird situations (ransom notes). that don't (or shouldn't) occur
 in practice.

 That said, some aspects of script rendering are of course in scope for the
 Unicode Standard.

 The natural scope for Unicode derives from character identity.

 Characters are encoded to represent certain entities in text. For characters
 that are
 members of scripts this means that there is an understood relation between
 character
 sequences and words (or fragments of words) in a given writing system that
 is supported
 by that script.

 If the lam alif ligature is mandatory, that tells the user that the
 character sequence for
 this is expected to be lam, alif with no joiner character between the two
 characters,
 nor the use of any dedicated character code for the ligature.

 The same goes for general joining behavior - for Arabic the default is
 described in
 the Standard, so that users know when to add ZWJ or ZWNJ for override.

 And so on...

 However, it's out of scope for Unicode to mandate anything about how to
 treat defective
 font bindings - Julian got that right.

 A./




Re: Joining Arabic Letters

2012-03-31 Thread Philippe Verdy
A test table for all Arabic characters that have defined joining types
(and most characters that are not joining) can be seen on this page:

http://en.wikipedia.org/wiki/Template:Arabic_alphabet_shapes/joining

This table is sorted by joining type, then by joining group.

You'll note that some characters that are normatively dual-joining do
not exhibit sometimes the mandatory joining with many fonts, notably
for characters that have been added more recently. What is more
strange is that the same fonts exhibit the left-joining not the right
joining, even though they are normatively dual joining (you can ignore
the letters that are not supported and are just displayed as squares,
and for which you'll see just a small non connecting tatweel on either
sides).

For now I've not seen any existing Arabic font that exhibit the
correct normative joining behavior for these letters such as  U+063D
(the Farsi Yeh with an inverted v above, which is dual-joining like
the Farsi Yeh at U+06CC without the inverted v above, and in the same
joining group; those fonts only map a single non-joining glyph for
U+063D, but behave correctly for U+06CC). This is true even for all
Arabic fonts shipped with Windows 7.

Note: this page is a test page, and there may remain some errors, but
the expected joinings are based directly on the normative joining
types and joining groups defined in Unicode.

My comment was then relevant, even in the case of just one font being used.

Le 31 mars 2012 08:32, Philippe Verdy verd...@wanadoo.fr a écrit :
 I was not speaking about ligatures like lan+alef. But really about the
 contextual forms chosen from base letters (and independantly of the
 diacritics applied to them, except for a few of them that use
 different shapes in some combinations for these contextual joining
 forms and that are encoded distinctly in the UCS to allow exactly a
 difference of these contextual shapes in some joining contexts).

 I have never said that the glyphs was mandatory. But the joining
 behavior of each letter (independantly of whever ligatures are applied
 on top of them) must be kept. So in a combination like LAM,
 diacritic, ALEF, the joining behavior of each letter must be kept,
 even if there's a mapping to a single glyph for LAM, diacritic, that
 has itself no ligature bound with the following ALEF. In that case it
 is perfectly acceptable to use a font for LAM+diacritic and another
 for ALEF. The absence of the ligature in the first font will have no
 impact on the readability of the text because the ligature is only
 recommended but not mandatory for the script.

 I just want to say that the encoding of a separate diacritic between
 base letters that would otherwise join cleanly if using only one font
 should not prevent each font to use the correct contextual form when
 two fonts are used for each letter, even if these joins may not look
 very cleanly connected. Using the non-joining letter forms at font
 boundaries is not acceptable for Arabic.

 Le 31 mars 2012 07:52, Asmus Freytag asm...@ix.netcom.com a écrit :
 On 3/30/2012 5:36 PM, Philippe Verdy wrote:

 Le 30 mars 2012 20:08, Julian Bradfieldjcb+unic...@inf.ed.ac.uk  a écrit
 :

 On 2012-03-30, Andreas Prilopprilop4...@trashmail.net  wrote:

 I think a better idea is to have joining glyphs always even for
 different typefaces. At least the Unicode Standard should say
 what should happen when Arabic characters of different typefaces
 follow each other.

 How can it? Unicode is about plain text. As soon as you start talking
 about different typefaces, you're out of scope.

 Not really. Even if there is only one typeface involved, the joining

 behavior of Arabic letters is normative and in scope.



 The discussion was about joining about typeface boundaries, which is
 nonsense, of course.

 In order to make characters join, the glyphs for each have to be designed
 to allow
 such joining. In cases where the join results in a ligature, it's patently
 obvious that you
 can't have a typeface boundary in the middle of a ligature

 Now there's always something that renderers could do to provide fall-back
 solutions.
 For example, they could see whether one or the other typeface has the full
 ligature
 and arbitrarily move the boundaries of the typeface runs. For a mandatory
 ligature
 like lam-alif that might almost be reasonable. (Just as fallback rendering
 of diacritics
 is somewhat reasonable).

 However, I rather have layout engines that work really well in sensible
 cases, then tryiing
 to cover weird situations (ransom notes). that don't (or shouldn't) occur
 in practice.

 That said, some aspects of script rendering are of course in scope for the
 Unicode Standard.

 The natural scope for Unicode derives from character identity.

 Characters are encoded to represent certain entities in text. For characters
 that are
 members of scripts this means that there is an understood relation between
 character
 sequences and words (or fragments of 

Re: Joining Arabic Letters

2012-03-31 Thread Khaled Hosny
On Sat, Mar 31, 2012 at 08:55:28AM +0200, Philippe Verdy wrote:
 For now I've not seen any existing Arabic font that exhibit the
 correct normative joining behavior for these letters such as  U+063D
 (the Farsi Yeh with an inverted v above, which is dual-joining like
 the Farsi Yeh at U+06CC without the inverted v above, and in the same
 joining group; those fonts only map a single non-joining glyph for
 U+063D, but behave correctly for U+06CC). This is true even for all
 Arabic fonts shipped with Windows 7.

Check my free Amiri font (http://amirifont.org), it has full Unicode 6.0
Arabic coverage, with 6.1 additions under the way. But if you are using
a layout engine that predates the addition of that character into
Unicode, even a good font will not help here since the engine will be
using the older Unicode character database where the joining behaviour
of this letter is undefined.

Regards,
 Khaled



Re: Joining Arabic Letters

2012-03-31 Thread Khaled Hosny
On Fri, Mar 30, 2012 at 07:37:53PM +0200, Andreas Prilop wrote:
 I come back to
  http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/thread.html#11
 
 A similar problem of showing non-joining, isolated Arabic glyphs
 can be seen in the attached file. Both Internet Explorer 8 and
 MS Word 2010 display isolated glyphs in some cases.
 
 I think a better idea is to have joining glyphs always even for
 different typefaces. At least the Unicode Standard should say
 what should happen when Arabic characters of different typefaces
 follow each other.

OpenOffice/LibreOffice work around this by conditionally inserting ZWJ
when there is a font switch in the middle of the word and joining is
desired.

Regards,
 Khaled



Re: Joining Arabic Letters

2012-03-31 Thread Philippe Verdy
I am testing it in the latest version of chrome, which was release
long after the latest Unicode addition to the Arabic letters (notably
the last update of Arabic joining types in the UCD). So may be it's
the internal engine used in Chrome that still does not support these
mandatory joining types.

But then, it would consider by default those characters as
**non-joining** (because this is explicitly the default value of the
joining type for all characters in the UCD that have not been assigned
joining types). This is not the case, the implementation considers
these characters as right-joining, so this is is clearly an
implementation bug.

Le 31 mars 2012 10:37, Khaled Hosny khaledho...@eglug.org a écrit :
 On Sat, Mar 31, 2012 at 08:55:28AM +0200, Philippe Verdy wrote:
 For now I've not seen any existing Arabic font that exhibit the
 correct normative joining behavior for these letters such as  U+063D
 (the Farsi Yeh with an inverted v above, which is dual-joining like
 the Farsi Yeh at U+06CC without the inverted v above, and in the same
 joining group; those fonts only map a single non-joining glyph for
 U+063D, but behave correctly for U+06CC). This is true even for all
 Arabic fonts shipped with Windows 7.

 Check my free Amiri font (http://amirifont.org), it has full Unicode 6.0
 Arabic coverage, with 6.1 additions under the way. But if you are using
 a layout engine that predates the addition of that character into
 Unicode, even a good font will not help here since the engine will be
 using the older Unicode character database where the joining behaviour
 of this letter is undefined.

 Regards,
  Khaled




Re: Joining Arabic Letters

2012-03-31 Thread Philippe Verdy
This is smart... provided that fonts also map the ZWJ (not all Arabic
fonts map it, they often map only ZWNJ to disable joinings, assuming
that there's no reason to force the joining in normal texts; some
Arabic fonts do not even map ZWNJ as well).

Some Arabic fonts do not even map the joining types internally but
depend on the engine to find the contextual forms by trying with the
compatibility characters (so they are not suitable for anything else
than basic Arabic).

Le 31 mars 2012 10:39, Khaled Hosny khaledho...@eglug.org a écrit :
 On Fri, Mar 30, 2012 at 07:37:53PM +0200, Andreas Prilop wrote:
 I come back to
  http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/thread.html#11

 A similar problem of showing non-joining, isolated Arabic glyphs
 can be seen in the attached file. Both Internet Explorer 8 and
 MS Word 2010 display isolated glyphs in some cases.

 I think a better idea is to have joining glyphs always even for
 different typefaces. At least the Unicode Standard should say
 what should happen when Arabic characters of different typefaces
 follow each other.

 OpenOffice/LibreOffice work around this by conditionally inserting ZWJ
 when there is a font switch in the middle of the word and joining is
 desired.




Joining Arabic Letters

2012-03-30 Thread Andreas Prilop
I come back to
 http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/thread.html#11

A similar problem of showing non-joining, isolated Arabic glyphs
can be seen in the attached file. Both Internet Explorer 8 and
MS Word 2010 display isolated glyphs in some cases.

I think a better idea is to have joining glyphs always even for
different typefaces. At least the Unicode Standard should say
what should happen when Arabic characters of different typefaces
follow each other.Title: Joining Arabic Letters




 









 




Re: Joining Arabic Letters

2012-03-30 Thread Julian Bradfield
On 2012-03-30, Andreas Prilop prilop4...@trashmail.net wrote:
 I think a better idea is to have joining glyphs always even for
 different typefaces. At least the Unicode Standard should say
 what should happen when Arabic characters of different typefaces
 follow each other.

How can it? Unicode is about plain text. As soon as you start talking
about different typefaces, you're out of scope.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




Re: Joining Arabic Letters

2012-03-30 Thread Philippe Verdy
Not really. Even if there is only one typeface involved, the joining
behavior of Arabic letters is normative and in scope.

This means that even if there's a font change between two letters (for
example due to a fallback for some letters or diacritics), each letter
should contonue to adopt its normative joining behavior (i.e.
displaying their correct joining form).

Then the renderer will just make a best effort to place the
diacritics on them (even if those diacritics comes from another font
than the base letter), but of course the ligatures of letters will not
be generated, and it's possible that two letters that are normally
joining perfectly will not join completely their joining strokes, even
if each letter is shown in their correct form.

If one wanted to disable the normative joining forms of letters, as
ZWNJ can be used between them.

I also think that the renderer should also be able to use base letters
and diacritics found in a font by decomposing advanced characters that
are encoded in the UCS with a single code point, if ever that
character is not mapped in the font, using a best effort to place the
diacritics, instead of trying to fond a fallback font that would map
the composite character.


Le 30 mars 2012 20:08, Julian Bradfield jcb+unic...@inf.ed.ac.uk a écrit :
 On 2012-03-30, Andreas Prilop prilop4...@trashmail.net wrote:
 I think a better idea is to have joining glyphs always even for
 different typefaces. At least the Unicode Standard should say
 what should happen when Arabic characters of different typefaces
 follow each other.

 How can it? Unicode is about plain text. As soon as you start talking
 about different typefaces, you're out of scope.