Re: [HarfBuzz] dotted circle is not appearing for dependant vowel

Jonathan Kew Tue, 24 Jul 2012 06:18:44 -0700

On 24/7/12 12:51, Shriramana Sharma wrote:

On Tue, Jul 24, 2012 at 3:26 PM, Pravin Satpute <[email protected]> wrote:


    I see the dotted circle is still not appearing with dependant vowels
(U+093f), Is this intentionally?
    Might be since you are removing test cases generating dotted circle
in Uniscribe before running it with harfbuzz.


May I take this opportunity to record what I have long felt on the
topic of dotted circles.

I feel that dotted circles should not be displayed except when not
doing so can cause non-canonically-equivalent encoded sequences to
appear the same. That is, they should be displayed only to distinguish
between such sequences. (This is to protect against phishing and
such.)

I don't think phishing protection is the responsibility of a shapingengine. There are far too many completely legitimate sequences (in both"complex" and "simple" scripts) that can be visually confusable.

For example, the long vowel आ does not have a decomposition to अ+ ा
whereas it would appear the same as the latter if there is no dotted
circle. There are many such "do not use" recommendations for
independent vowels in the Indic Unicode chapters because of the
absences of canonical equivalences (unfortunate IMO but well....).

Software designed for phishing protection might indeed want to guardagainst such sequences (among many other things); however, I don't thinkthis is the shaping engine's job.

Reordrant vowels like ि are also likewise, because in the case of a
sequence अिक mistakenly typed (or maliciously introduced) for अकि if
there is no dotted circle the two sequences would appear the same

This isn't a particularly good example. In my email client, neither ofthem shows a dotted circle, but neither do they look the same. The firstone displays the i-matra to the left of the full a-vowel; the seconddisplays it between the a-vowel and the ka. This seems like a perfectlyreasonable way to render the two sequences. If there are use cases (ashas already been mentioned) for multiple vowel matras on a single baseconsonant, why shouldn't there also be use cases for vowel matras placedon a full vowel letter as their base?

A pair that could be more problematic would be कि / िक (0915,093F /093F,0915). These do display identically here where I'm typing (althoughmany systems doubtless insert a dotted circle in the second case).

which is not appropriate from a security viewpoint as they are not
canonically equivalent.

My point is, there may be many reasons for unexpected combinations of
characters in Indic. Vedic texts is one. Minority orthographies is
(which may use rare combinations of vowel signs and diacritics)
another. Legitimate creative use (like काााााा) for "kaaaa" (a shout)
is yet another. Imposing a limited orthography (i.e. only recognizing
a certain set of patterns of sequences and producing dotted circles
for sequences that do not fit the pattern) would preclude the
usefulness of the rendering system to users of such cases.

Of course, this usability can also be achieved by first imposing a
generic orthography (i.e. script grammar) and later adding more
recognized sequences as per user community request. (This is also much
easier to produce and deliver to the community in open source
ecosystems than in proprietary ones.)

This would be advisable since it may be difficult to predict which
sequences in Indic would be confusable, especially with non-spacing
marks. For example, तु and तुु would be confusable if there is no
dotted circle and the second ु is overlaid upon the first.

A careful font designer can address examples like this by providingmark-to-mark positioning rules that will make multiple copies of thesame mark "stack" rather than simply overprint each other.

Of course, not every font designer will be so careful. But then, notevery Latin-script font adequately distinguishes 'I', 'l', and '1',either. We can't expect shaping engines to somehow make up for visualambiguities in font designs.


But these sequences are not self-obvious, so it appears creating
regexs for sequences where dotted circles should *not* be produced
might be easier than to do so where they *should* be produced and it
would be appropriate to err on the side of caution.

IMO, "to err on the side of caution" in the matter of dotted-circleinsertion means that we should avoid the risk of blocking a use casethat someone might someday want, even if we can't anticipate thatparticular need. So, for example, even though we may not be aware of anycurrent need for a sequence such as "अिुा", there's no compelling reasonfor a shaping engine to insert dotted circles into it and thus make itimpossible for a user to encode and render an a-vowel with these threematras placed around it.

In general, I think the Indic shaper should *not* insert dotted circles.The one exception that I think may be desirable would be the case ofleft-reordrant matras when no usable base character (either consonant orvowel letter, or other "placeholder" such as an explicit U+25cc or aspace, no-break space, etc) can be found. In this case inserting adotted circle (or a space?) to act as the base, and then reordering thematra to the left of it, may be the best option, so that a "visuallyencoded" sequence िक does not appear identical to the correctly-encoded कि.


I had to say this, being a scholar of Sanskrit and Vedic, which really
puts scripts (and hence software support for them) to their limit.
Pravin (OP on this thread) and I, we have plans for developing a Lohit
Devanagari Vedic font, so we'll be coming back on this...


_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Re: [HarfBuzz] dotted circle is not appearing for dependant vowel

Reply via email to