RE: [l10n-dev] is it worse to backport this for Hebrew?

Maxim Iorsh Sun, 27 Feb 2005 14:00:03 -0800

Hello!

Just a small note on heuristics: it is common for Hebrew fonts to
include precomposed letters with dagesh, since the latter are part of
the Unicode specification. Therefore, the rendering algorithm might try
to check for their presence and use them as appropriate. For example, if
a string contains characters U+05D1, U+05BC (bet + dagesh), the
algorithm might check whether the character U+FB31 (bet with dagesh) is
present in the font, and substitute it instead.

Maxim.

-----Original Message-----
From: Jonathan Ben Avraham [mailto:[EMAIL PROTECTED] 
Sent: Thursday, February 24, 2005 3:36 PM
To: [email protected]
Cc: Maxim Iorsh; Sivan Toledo
Subject: Re: [l10n-dev] is it worse to backport this for Hebrew?

Hi Jens, Eike, et al,
Here is some background that might be helpfull. I hope that it does not 
add to the confusion.

It is not possible to do a reasonable job of rendering debrew diacritics

by algorithmic means. By "reasonable job" I mean rendering that you
would 
not be embarrassed to use in an advertisement or business letter. By 
"algorithmic means" I mean by use of some heuristic based on the
external 
dimensions of the consonental glyphs to which the diacritics apply. The 
reason for this is the the style of the font affects the placement of
the 
diacritic. For example, consider the letter "i" without the dot to be a 
Hebrew letter, and the dot to be a Hebrew diacritic. In order to put the

two together in a visually appealing way, "i", you need information from

the font about where to place the dot. If the dot is slightly to the
right 
or left or too high or low, the native reader will immediatelty sense
the 
gaff.

Because of this, there is no way to render Hebrew diacritics
"reasonably" 
without OT fonts. The best that you can do with a TT font is to use only

diacritics that are placed below the base line, and place them directly 
under the consonant to which they apply and hope for the best. Sometimes

it will look Ok, sometimes it wont, depending on the font style and size

and the degree to with the font author might have known about your 
diacritic placement heuristic or not. You can't use the diacritics that 
are in the mid line or above the glyph, since the herustic placement 
strategy will almost always cause the diacritic to collide with part of 
the glyph (e.g. SHIN DGUSH), and even some of the diacritics below the 
base line will be incorrectly placed for the four "final form" letters.

I expect that as time goes by, more Hebrew fonts will use the OT GSUB
and 
GPOS tables for diacritic positioning. Some complex cases, such as 
Biblical text layout, require treating a consonant with two or more 
diacritics as a ligature with its own glyph in order to get the
placement 
right. Some foundaries might decide to position *all* diacritics in 
some fonts using GSUB if the font is very fancy or it is easier for them

to do this and font file size is not an issue.

In summary, the responsibility for correct placement of diacritics 
in Hebrew should be shifted to the font. We still need some fallback 
heuristic for diacritic placement when non-OT fonts are used. For some 
non-OT fonts the heurustic will work better than others. When OT fonts 
with GSUB and GPOS data of any sort are used, I think that it is 
reasonable to rely entirely on the GPOS and GSUB data and not apply any 
further heuristics to attempt to position the diacritics.

I am CC'ing two authorities, Sivan Toledo and Maxim Iorsh on this post
as 
I believe that they have a better understanding of the above issues that
I 
do and might want to post some corrections to the above.
Regards,

  - yba

On Wed, 23 Feb 2005, Jens Herden wrote:

> Hi Eike,
>
>>> When I apply the patch to ICU 2.6 I can actually see the problems
with
>>> Khmer change but not disappear.
>>
>> Hmm.. if it doesn't solve but just shift the problem I don't think it
>> would be worth applying the patch to 2.6, or would it?
>
> Right, that's the reason why the subject is talking about Hebrew and
not
> Khmer ;-)
>
>
>>> Meanwhile I saw an issue in the OOo bug-database related to Hebrew
and I
>>> think this ICU bug was mentioned there as well.
>>
>> Do remember the issue's number / summary or the query you used?
>
> http://www.openoffice.org/issues/show_bug.cgi?id=14069
>
> Reading this again confuses me a bit, but I am not in favor for Hebrew
so that
> I feel no wish to understand this yet.
>
>
>>> My impression was that this is a rather simple patch and I decided
to
>>> post it here so that people may have a look.
>>
>> Of course. I'd like to have some more feedback however, before I
apply
>> a patch that affects all RightToLeft layout without me knowing what
>> exactly it does ;-)
>
> That is the very correct way to deal with it :-)
>
>
>>> BTW. are there plans when OOo will switch to ICU 3.2 ?
>>
>> If QA resources permit I'd like to do that for the next release after
>> 2.0, hopefully 2.0.1 if possible, depends on the time frame we'll
have.
>
> That would be great, because in the moment I am stuck with Khmer and
ICU 2.6.
> Switching to 3.2 would solve the problems.
>
> Jens
>
>

-- 
  EE 77 7F 30 4A 64 2E C5  83 5F E7 49 A6 82 29 BA    ~. .~   Tk Open
Systems
=}------------------------------------------------ooO--U--Ooo-----------
-{=
      - [EMAIL PROTECTED] - tel: +972.2.679.5364, http://www.tkos.co.il -

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [l10n-dev] is it worse to backport this for Hebrew?

Reply via email to