font-encoded hacks

2016-10-06 Thread Andrew Cunningham
Considering the mess that adhoc fonts create. What is the best way forward? Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk? Most governemt translations I am seeing in Australia for Burmese are in Zawgyi, while most of the Sgaw Karen tramslations are routinely in legacy 8-bit fonts. Andrew

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Oren Watson
That application is hindered by the fact that 픆픋플픕픝픺픿핅핇핈핉핑풝풠풡풣풤풧풨풭풺풼퓄 are unallocated characters, forming gaps in the otherwise contiguous mathematical alphabets. On Thu, Oct 6, 2016 at 8:28 PM, Richard Wordingham < richard.wording...@ntlworld.com> wrote: > On Thu, 6 Oct 2016 16:54:21 -0700 >

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Richard Wordingham
On Thu, 6 Oct 2016 16:54:21 -0700 Ken Whistler wrote: > On 10/6/2016 4:32 PM, Richard Wordingham wrote: > > The > > problem is that manually constructed lookup tables are prone to > > human error. > > ... as are manually constructed algorithms that attempt to take >

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Ken Whistler
On 10/6/2016 4:32 PM, Richard Wordingham wrote: The problem is that manually constructed lookup tables are prone to human error. ... as are manually constructed algorithms that attempt to take advantage of sub-ranges of case pair adjacency in the Unicode code charts to do casing with bit

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Richard Wordingham
On Thu, 6 Oct 2016 12:44:05 -0700 Garth Wallace wrote: > Other than converting between UTFs, is bit arithmetic commonly > performed on Unicode characters? I was under the impression that it's > a rarity if it is done at all. It's possible to use it for the bulk of case

Re: IJ with accent

2016-10-06 Thread Michael Everson
On 6 Oct 2016, at 23:09, Lorna Evans wrote: > > Has it been mentioned that U+0133 is not listed in the Soft_Dotted > properties? So, that would indicate it shouldn't have the dot removed when > you do put an acute over U+0133. It ought to have that property. Michael

Re: IJ with accent

2016-10-06 Thread Lorna Evans
Has it been mentioned that U+0133 is not listed in the Soft_Dotted properties? So, that would indicate it shouldn't have the dot removed when you do put an acute over U+0133. Lorna On 9/28/2016 2:59 AM, a.lukyanov wrote: Dutch language writing uses the ligature ij (U+0132, U+0133). When

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Ken Whistler
On 10/6/2016 12:44 PM, Garth Wallace wrote: Some representatives of the WFCC have proposed alternate arrangements that assume there will be a need for bitwise operations to covert between the existing chess symbols in the Miscellaneous Symbols block and related symbols in the new block. I

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Christoph Päper
Philippe Verdy : > > But if semantic is your issue, we could insert an invisible Unicode mark of > abbreviation (notably the invisible abbreviation dot, which may be rendered > as a dot in some contexts where distinctions by styles cannot be used, or > could be rendered by

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Philippe Verdy
As far as we know, arithmetic is performed only in - subsets of decimal digits in ASCII and for a dozen of scripts and converting automatically between them using a single additive constant for the 10 digits. - Basic Latin/ASCII for mapping lettercases and mapping non-decimal digits (adding 6

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Asmus Freytag (c)
On 10/6/2016 12:44 PM, Garth Wallace wrote: Other than converting between UTFs, is bit arithmetic commonly performed on Unicode characters? I was under the impression that it's a rarity if it is done at all. I've been

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Philippe Verdy
2016-10-06 21:48 GMT+02:00 Christoph Päper : > > For ordinal numbers, it’s relatively simple to code language-dependent > glyph substitution in Opentype which would not require any additional > effort from the author, “3ème” would just work, “3e” → “3ᵉ” would require

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Marcel Schneider
On Wed, 5 Oct 2016 06:35:52 +, Martin Mueller wrote: […] > That said, given that alphabets have fixed numbers, it’s weird > that bits of super and subscripted letters appear in this or > that limited range but that you can’t cobble a whole alphabet > together in a consistent manner. Indeed

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Marcel Schneider
On Thu, 6 Oct 2016 21:20:22 +0300, Jukka K. Korpela wrote: > In a sense, superscript code points make this easier: the rendering can > simply pick up the corresponding glyph for the font – if it has one (a > big “if”). But this is not a good argument in favor of adding such > points en masse. It

RE: Dealing with Unencodeable Characters

2016-10-06 Thread Doug Ewell
Philippe Verdy wrote: > The 3 glyphs for the Earth globe (centered on Americas, or > Europe+Africa or South/East Asia+Australia) are not distinguished at > all in Unicode (I've not seen any sequence with variants selectors to > help distinguishinhg them, 0xFC through 0xFE in Webdings are:

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Christoph Päper
Jukka K. Korpela : > > … the solution is to use just “3ème”, perhaps with some method (“above” the > character level) used to format the letters as superscript, when not limited > to plain text … For ordinal numbers, it’s relatively simple to code language-dependent glyph

Bit arithmetic on Unicode characters?

2016-10-06 Thread Garth Wallace
Other than converting between UTFs, is bit arithmetic commonly performed on Unicode characters? I was under the impression that it's a rarity if it is done at all. I've been working on a proposal for additional chess symbols used in chess problems and variant games, and I've been in communication

Re: Dealing with Unencodeable Characters

2016-10-06 Thread Philippe Verdy
2016-10-06 21:03 GMT+02:00 Doug Ewell : > > * "Wingdings", "Wingdings 2", are here again maaping various forms of > > arrows and arrow heads, plus some emojis or enclosed characters, or > > decorative characters. "Wingdings" also includes another Windows logo > > at position

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Philippe Verdy
2016-10-06 21:02 GMT+02:00 Doug Ewell : > >> Like «3ᵉ̀ᵐᵉ» ? It already works on my laptop (Thunderbird in Ubuntu > >> 16.04) The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there > >> is nothing to add. > > > > It does not render very well, the accent is not correctly

Re: Dealing with Unencodeable Characters

2016-10-06 Thread Doug Ewell
Charlotte Buff wrote: > Private use characters are an obvious choice but of course their > meaning is user-defined, so while all other emoji in my Shift JIS > document would receive an unambiguous Unicode mapping, Shibuya 109 > would remain vague and very limited in interchange options. But

Re: Dealing with Unencodeable Characters

2016-10-06 Thread Doug Ewell
> * "Wingdings", "Wingdings 2", are here again maaping various forms of > arrows and arrow heads, plus some emojis or enclosed characters, or > decorative characters. "Wingdings" also includes another Windows logo > at position 0xFF; these fonts are not mapped to Unicode but to 8-bit > code

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Doug Ewell
>> Like «3ᵉ̀ᵐᵉ» ? It already works on my laptop (Thunderbird in Ubuntu >> 16.04) The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there >> is nothing to add. > > It does not render very well, the accent is not correctly positioned > vertically (far too high) above the superscript e and

Re: Dealing with Unencodeable Characters

2016-10-06 Thread Ken Whistler
On 10/6/2016 7:54 AM, Charlotte Buff wrote: If theoretically I wanted to convert an old Shift JIS document containing emoji to Unicode, how should I ideally handle Shibuya 109? And the general answer to that is convert to U+FFFD, unless you are doing something specific and know what you are

Re: Fwd: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Jukka K. Korpela
6.10.2016, 19:27, Ken Whistler wrote: Their functions have been completely overtaken by markup conventions such as ... and ..., which *are* widely supported already, even in most email clients, ri^ght out of the b_ox . They are widely supported, but very widely in a typographically inferior

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Philippe Verdy
It does not render very well, the accent is not correctly positioned vertically (far too high) above the superscript e and colliding with the previous line of text at normal line-height, because fonts do not support this pair with proper positioning. The combination is just rendered in some "best

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Marcel Schneider
On Thu, 6 Oct 2016 09:27:13 -0700, Ken Whistler wrote: […] > Their functions have been completely overtaken by markup conventions > such as ... and ..., which *are* widely supported > already, even in most email clients, ri^ght out of the b_ox . > > And I suspect that Yucca's statement "so it

Re: Dealing with Unencodeable Characters

2016-10-06 Thread Philippe Verdy
PUA characters are still used when mapping corporate logos (from Windows and Apple/MacOS) in fonts for the relevant systems. Microsoft then opted to include these corporate logos (and specific UI icons) in a separate font, also with PUA mappings, and then added new PUA fonts as needed. E.g.: *

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Ken Whistler
On 10/6/2016 9:32 AM, Oren Watson wrote: I meant, petition say the devs of Konsole, iTerm, xterm etc, and other programs which deal purely in plain text to support 8b and 8c characters for formatting. Markup doesn't exist everywhere. Fair enough. But most actual terminals didn't support

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Marcel Schneider
On Thu, 6 Oct 2016 16:55:32 +0200, Frédéric Grosshans wrote: […] >> Anyway, combining diacritics should be placeable on superscripts as well. > Like «3ᵉ̀ᵐᵉ» ? It already works on my laptop (Thunderbird in Ubuntu 16.04) > The superscripted part is 1D49 + 0300 + 1D50 + 1D49, and there is > nothing

Fwd: Fwd: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Oren Watson
I meant, petition say the devs of Konsole, iTerm, xterm etc, and other programs which deal purely in plain text to support 8b and 8c characters for formatting. Markup doesn't exist everywhere. On Thu, Oct 6, 2016 at 12:27 PM, Ken Whistler wrote: > > > On 10/6/2016 9:04 AM,

Re: Fwd: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Ken Whistler
On 10/6/2016 9:04 AM, Oren Watson wrote: If this is a real need, why not petition more software to allow the use of the U+8C partial line up and U+8B partial line down characters for the this purpose? Because U+008C and U+008B are relics from the days when control codes were used in

Fwd: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Oren Watson
-- Forwarded message -- From: Oren Watson Date: Thu, Oct 6, 2016 at 12:03 PM Subject: Re: Why incomplete subscript/superscript alphabet ? To: "Jukka K. Korpela" If this is a real need, why not petition more software to allow the use of

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Jukka K. Korpela
6.10.2016, 17:55, Frédéric Grosshans wrote: Le 06/10/2016 à 09:21, Marcel Schneider a écrit : I did never see that. Would you show us some examples to look up? Iʼm curious whether they could be managed without accented superscripts. Anyway, combining diacritics should be placeable on

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Frédéric Grosshans
Le 06/10/2016 à 09:21, Marcel Schneider a écrit : I did never see that. Would you show us some examples to look up? Iʼm curious whether they could be managed without accented superscripts. Anyway, combining diacritics should be placeable on superscripts as well. Like «3ᵉ̀ᵐᵉ» ? It already works

Dealing with Unencodeable Characters

2016-10-06 Thread Charlotte Buff
One of Unicode's goals is round-trip compatibility with old legacy character sets, which is why we gathered many compatibility characters over time that would normally have been out of scope for the standard. It's why Zapf Dingbats and arabic presentation forms are in Unicode for example. However,

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Philippe Verdy
2016-10-06 9:21 GMT+02:00 Marcel Schneider : > > Almost nobody use the preencoded superscript letters for this (notably > not > > for "1er", or its recommended feminine form "1re", > > still frequently written "1ère") > > They donʼt because these are not on the keyboard.

Re: Why incomplete subscript/superscript alphabet ?

2016-10-06 Thread Marcel Schneider
On Wed, 5 Oct 2016 17:34:02 +0200, Philippe Verdy wrote: […] > > I agree, French allows abbreviating many words by appending the last new > letters in superscripts. 3e is recommended but > 3ème > is still very frequent. As well you'll see abbreviations using é > (a frequent termination for past