On 24/7/12 12:51, Shriramana Sharma wrote:
On Tue, Jul 24, 2012 at 3:26 PM, Pravin Satpute <[email protected]> wrote:

    I see the dotted circle is still not appearing with dependant vowels
(U+093f), Is this intentionally?
    Might be since you are removing test cases generating dotted circle
in Uniscribe before running it with harfbuzz.

May I take this opportunity to record what I have long felt on the
topic of dotted circles.

I feel that dotted circles should not be displayed except when not
doing so can cause non-canonically-equivalent encoded sequences to
appear the same. That is, they should be displayed only to distinguish
between such sequences. (This is to protect against phishing and
such.)

I don't think phishing protection is the responsibility of a shaping engine. There are far too many completely legitimate sequences (in both "complex" and "simple" scripts) that can be visually confusable.

For example, the long vowel आ does not have a decomposition to अ+ ा
whereas it would appear the same as the latter if there is no dotted
circle. There are many such "do not use" recommendations for
independent vowels in the Indic Unicode chapters because of the
absences of canonical equivalences (unfortunate IMO but well....).

Software designed for phishing protection might indeed want to guard against such sequences (among many other things); however, I don't think this is the shaping engine's job.

Reordrant vowels like ि are also likewise, because in the case of a
sequence अिक mistakenly typed (or maliciously introduced) for अकि if
there is no dotted circle the two sequences would appear the same

This isn't a particularly good example. In my email client, neither of them shows a dotted circle, but neither do they look the same. The first one displays the i-matra to the left of the full a-vowel; the second displays it between the a-vowel and the ka. This seems like a perfectly reasonable way to render the two sequences. If there are use cases (as has already been mentioned) for multiple vowel matras on a single base consonant, why shouldn't there also be use cases for vowel matras placed on a full vowel letter as their base?

A pair that could be more problematic would be कि / िक (0915,093F / 093F,0915). These do display identically here where I'm typing (although many systems doubtless insert a dotted circle in the second case).

which is not appropriate from a security viewpoint as they are not
canonically equivalent.

My point is, there may be many reasons for unexpected combinations of
characters in Indic. Vedic texts is one. Minority orthographies is
(which may use rare combinations of vowel signs and diacritics)
another. Legitimate creative use (like काााााा) for "kaaaa" (a shout)
is yet another. Imposing a limited orthography (i.e. only recognizing
a certain set of patterns of sequences and producing dotted circles
for sequences that do not fit the pattern) would preclude the
usefulness of the rendering system to users of such cases.

Of course, this usability can also be achieved by first imposing a
generic orthography (i.e. script grammar) and later adding more
recognized sequences as per user community request. (This is also much
easier to produce and deliver to the community in open source
ecosystems than in proprietary ones.)

This would be advisable since it may be difficult to predict which
sequences in Indic would be confusable, especially with non-spacing
marks. For example, तु and तुु would be confusable if there is no
dotted circle and the second ु is overlaid upon the first.

A careful font designer can address examples like this by providing mark-to-mark positioning rules that will make multiple copies of the same mark "stack" rather than simply overprint each other.

Of course, not every font designer will be so careful. But then, not every Latin-script font adequately distinguishes 'I', 'l', and '1', either. We can't expect shaping engines to somehow make up for visual ambiguities in font designs.


But these sequences are not self-obvious, so it appears creating
regexs for sequences where dotted circles should *not* be produced
might be easier than to do so where they *should* be produced and it
would be appropriate to err on the side of caution.

IMO, "to err on the side of caution" in the matter of dotted-circle insertion means that we should avoid the risk of blocking a use case that someone might someday want, even if we can't anticipate that particular need. So, for example, even though we may not be aware of any current need for a sequence such as "अिुा", there's no compelling reason for a shaping engine to insert dotted circles into it and thus make it impossible for a user to encode and render an a-vowel with these three matras placed around it.

In general, I think the Indic shaper should *not* insert dotted circles. The one exception that I think may be desirable would be the case of left-reordrant matras when no usable base character (either consonant or vowel letter, or other "placeholder" such as an explicit U+25cc or a space, no-break space, etc) can be found. In this case inserting a dotted circle (or a space?) to act as the base, and then reordering the matra to the left of it, may be the best option, so that a "visually encoded" sequence िक does not appear identical to the correctly-encoded कि.


I had to say this, being a scholar of Sanskrit and Vedic, which really
puts scripts (and hence software support for them) to their limit.
Pravin (OP on this thread) and I, we have plans for developing a Lohit
Devanagari Vedic font, so we'll be coming back on this...


_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to