I have started to investigate into the matter and I find that there is very 
sparse information readily available. Microsoft had a very interesting 
document, which I have been trying to understand: 
http://www.microsoft.com/typography/OpenTypeDev/tibetan/intro.htm
In the section "Examples of Tibetan" (bottom of document), the first example 
shows how a sequence of eight code points are strung together to form a Tibetan 
"syllable". It's not really a syllable. It's called a tsheg-bar in Tibetan and 
a Tibetan word can consist of multiple of these. Anyways, all the code points 
from the second one up to and including the fourth one are as far as I can tell 
formed into a ligature. I have opened the MS Himalaya font in Fontforge and 
seen that this ligature is defined as "tibSa_Ga_Rata_Shapkyu" in a location 
outside of the Unicode address space.

Now, if a user is to work with Tibetan text like any other user of a roman 
script language, the user of Tibetan script would be very disappointed. The 
reason for this is that it is impossible to place the caret and select 
individual characters in this ligature. As of now, you can only select the 
entire stack as a whole selection. This is partly because the glyphs have been 
transmuted into a ligature, but perhaps also because there seems to be no 
definitions of vertical caret ligatures anywhere.

How would one go about defining such an important feature? Should this be 
implemented in the font? Should it be implemented in the software that handles 
the type face? Or perhaps both? While digging through the MS Himalaya font, I 
found that there is a value for a Ligature Caret Count. What is this value 
supposed to be used for? For the ligatures that are supposed to represent 
stacks of multiple glyphs, the Ligature Caret Count had values up to 4, which I 
hope can mean that the font itself contains the information I am looking for. 
Is my assumption correct?

Also. I am highly willing to learn more about the inner bowels of typeface 
rendering. I have taken a course on Computer design and understand how 
everything are bits and how Asssembly language and C handles this. I also 
understand the general idea about Unicode and how this is defined on a low 
level. I've also understood that fonts are basically Bézier curves which are 
rasterized to the screen buffer. There is still a lot of this process which I 
still find very murky, so if anybody knows any in depth reading material, I 
would be very happy to start reading those. I have read State of Text Rendering 
by Behdad Esfahbod, which was a great overview of the text rendering stack. But 
I would really like to get more in depth understanding of each layer in the 
stack.

I would really like to also learn more about Harfbuzz and how to work on it. I 
would really love to spend some time working on it, if I am at a level where my 
code submission would be acceptable to the standards of this project. Is there 
any documentation for Harfbuzz? I've taken a quick glance at the source code 
and run some scripts and make commands, but I honestly don't know what's going 
on. Why are the C files named .cc and some of the header files named .hh? I can 
recognize some font lingo and have a slight understanding of what might be 
going on, but it would be really helpful to have something like this for an 
Openfont file: http://imgur.com/a/JEObT#0. I must also say that I have no idea 
of what harfbuzz is supposed to do and how I test or use it once it's compiled. 
I've tried running some shell scripts and bin files that were compiled, but I 
really have no clue.

Is there somewhere I can learn how to get a handle on understanding the 
technicalities of the Harfbuzz project and learn what I need to start 
contributing?

Sincerely,
Robin Skahjem-Eriksen
_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to