subject:"RE\: Encoding italic"

Re: Encoding colour (from Re: Encoding italic)

2019-02-13 Thread Asmus Freytag via Unicode


  
  
On 2/13/2019 5:19 PM, Mark E. Shoulson
  via Unicode wrote:

 And
  again, all this is before we even consider other issues; I can't
  shake the feeling that there security nightmares lurking inside
  this idea.

Default ignorables are bad juju.
A./

Re: Encoding colour (from Re: Encoding italic)

2019-02-13 Thread Mark E. Shoulson via Unicode


On 2/12/19 12:05 PM, Kent Karlsson via Unicode wrote:

Den 2019-02-12 03:20, skrev "Mark E. Shoulson via Unicode"
:


On 2/11/19 5:46 PM, Kent Karlsson via Unicode wrote:

Continuing too look deep into the crystal ball, doing some more
hand swirls...

...

...

The scheme quoted (far) below (from wjgo_10009), or anything like it,
will NEVER be part of Unicode!

Not in Unicode, but I have to say I'm intrigued by the idea of writing
HTML with tag characters (not even necessarily "restricted" HTML: the
whole deal).  This does NOT make it possible to write "italics in plain
text," since you aren't writing plain text.  But what you can do is
write rich text (HTML) that Just So Happens to look like plain text when
rendered with a plain-text-renderer (and maybe there could be
plain-text-renderers that straddle the line, maybe supporting some
limited subset of HTML and doing boldface and italics or something.

And so would ESC/command sequences as such, if properly skipped for display.
If some are interpreted, those would affect the display of other characters.
Just like "HTML in tag characters" would. A show invisibles mode would
display both ESC/command sequences as well as "HTML in tag characters"
characters.
Very true.  Maybe the explicitness of HTML appealed to me; escape 
sequences feel more like... you know, computer "codes" and all. (which 
of course is what all this is anyway!  So what's wrong with that?)

BUT, this would NOT be a Unicode feature/catastrophe at all.  This would
be purely the decision of the committee in charge of HTML/XML and
related standards, to decide to accept Unicode tag characters as if they
were ASCII for the purposes of writing XML tags/attributes   It's

I have no say on HTML/CSS, but I would venture to predict that those
who do have a say, would not be keen on that idea. And XML tags in
general need not be in ASCII. And... identifiers in CSS need not
be in pure ASCII either... And attribute values, like filenames
including those that refer to CSS files (CSS is preferably stored
separately from the HTML/XML), certainly need not be pure ASCII.)

So, no, I'd say that that idea is completely dead.


You're probably right, and CSS is practically a different animal, and I 
guess at best one would have to settle for a stripped-down version of 
HTML (in which case, why bother?)  And again, all this is before we even 
consider other issues; I can't shake the feeling that there security 
nightmares lurking inside this idea.


~mark

Re: Encoding colour (from Re: Encoding italic)

2019-02-13 Thread wjgo_10...@btinternet.com via Unicode


Philippe Verdy replied to my post, including quoting me.

WJGO >>  Thinking about this further, for this application copies of the 
glyphs could be redesigned so as to be square and could be emoji-style 
and the meanings of the characters specifying which colour component is 
to be set could be changed so that they refer to the number previously 
entered using one or more of the special  digit characters. Thus the 
setting of colour components could be done in the same reverse notation 
way that the FORTH computer language works.


PV > FORTH is not relevant to this discussion.

I just mentioned FORTH because of the way that numbers are entered 
before the operators that act upon them. I have no intention to use a 
stack-based system: what I have in mind at present is much simpler than 
such a format.


Suppose that there are sixteen new characters, which are in plane 1 or 
maybe plane 14, but which for this mailing list post I will express 
using the digits 0 .. 9, Z, R, G, B, A, F.


There would be a virtual machine to set the colour, that would have 
registers h, r, g, b, a and a system service 
Set_Foreground_Colour(r,g,b,a).


Then the sixteen new characters would each have a default glyph, which 
could be displayed emoji-style, and, in an application environment that 
has the virtual machine available and switched on, would have the 
following effects in the virtual machine and their glyphs would not then 
be displayed. The virtual machine would be sandboxed.


Z h:=0;
0 h:=10*h ;
1 h:=10*h + 1;
2 h:=10*h + 2;
3 h:=10*h + 3;
4 h:=10*h + 4;
5 h:=10*h + 5;
6 h:=10*h + 6;
7 h:=10*h + 7;
8 h:=10*h + 8;
9 h:=10*h + 9;
R r:=h; h:=0;
G g:=h; h:=0;
B b:=h; h:=0;
A a:=h; h:=0;
F Set_Foreground_Colour(r,g,b,a);

Thus for example, remembering that these ordinary characters are just 
being used here for explanation in this post, and that the actual 
characters if encoded would probably be in plane 1 or plane 14:


So the sequence Z128R160G248B255AF could be used to set the foreground 
colour to an opaque blue colour.


It may be that upon investiation there could be specified a feature of 
the system service Set_Foreground_Colour(r,g,b,a) such that "if a=0 then 
a:=255;" so that total opacity of the colour is presumed unless 
otherwise set.


PV > You may create your "proof of concept" (tested on limited 
configurations) but it will just be private


Yes.

PV > [And so it should use PUA for full compatibility ...

Yes, I have in mind to use U+EA60 through to U+EA69 for the digits, as 
U+EA60 is Alt 6 so it makes it easier if some of the people who want 
to experiment want to enter characters using the Alt method.


William Overington
Monday 11 February 2019

Re: Encoding italic

2019-02-12 Thread Kent Karlsson via Unicode



Oh, the crystal ball is pure solid state, no moving or hot parts.
A magic 8-ball on the other hand can easily get jammed...

(Now, enough of that...)

/K


Den 2019-02-12 02:57, skrev "James Kass via Unicode" :

> 
> On 2019-02-11 6:42 PM, Kent Karlsson wrote:
> 
>> Using a VS to get italics, or anything like that approach, will
>> NEVER be a part of Unicode!
> 
> Maybe the crystal ball is jammed.  This can happen, especially on the
> older models which use vacuum tubes.
> 
> Wanting a second opinion, I asked the magic 8 ball:
> ³Will VS14 italic be part of Unicode?²
> The answer was:
> ³It is decidedly so.²
>

Re: Encoding colour (from Re: Encoding italic)

2019-02-12 Thread Kent Karlsson via Unicode

Den 2019-02-12 03:20, skrev "Mark E. Shoulson via Unicode"
:

> On 2/11/19 5:46 PM, Kent Karlsson via Unicode wrote:
>> Continuing too look deep into the crystal ball, doing some more
>> hand swirls...
>> 
>> ...
>> 
>> ...
>> 
>> The scheme quoted (far) below (from wjgo_10009), or anything like it,
>> will NEVER be part of Unicode!
> 
> Not in Unicode, but I have to say I'm intrigued by the idea of writing
> HTML with tag characters (not even necessarily "restricted" HTML: the
> whole deal).  This does NOT make it possible to write "italics in plain
> text," since you aren't writing plain text.  But what you can do is
> write rich text (HTML) that Just So Happens to look like plain text when
> rendered with a plain-text-renderer (and maybe there could be
> plain-text-renderers that straddle the line, maybe supporting some
> limited subset of HTML and doing boldface and italics or something. 

And so would ESC/command sequences as such, if properly skipped for display.
If some are interpreted, those would affect the display of other characters.
Just like "HTML in tag characters" would. A show invisibles mode would
display both ESC/command sequences as well as "HTML in tag characters"
characters.

> BUT, this would NOT be a Unicode feature/catastrophe at all.  This would
> be purely the decision of the committee in charge of HTML/XML and
> related standards, to decide to accept Unicode tag characters as if they
> were ASCII for the purposes of writing XML tags/attributes   It's

I have no say on HTML/CSS, but I would venture to predict that those
who do have a say, would not be keen on that idea. And XML tags in
general need not be in ASCII. And... identifiers in CSS need not
be in pure ASCII either... And attribute values, like filenames
including those that refer to CSS files (CSS is preferably stored
separately from the HTML/XML), certainly need not be pure ASCII.)

So, no, I'd say that that idea is completely dead.

/Kent K

> totally nothing to do with Unicode, unless the XML folks want Unicode to
> change some properties on the tag chars or something.  I think it's a...
> fascinating idea, and probably has *disastrous* consequences lurking
> that I haven't tried to think of yet, but it's not a Unicode idea.
> 
> ~mark
>

Re: Encoding italic

2019-02-11 Thread James Kass via Unicode




Philippe Verdy wrote,

>>> case mappings,
>>
>> Adjust them as needed.
>
> Not so easy: case mappings cannot be fixed. They are stabilized in 
Unicode.

> You would need special casing rules under a specific "locale" for maths.

In BabelPad, I can select a string of text and convert it to math 
italics.  If upper case italics is desired, it would be necessary to 
select the text, convert it back to ASCII, convert it to upper case, and 
convert that upper case to math italics.  Casing the math alphanumerics 
doesn’t seem to present any problem.  Any program could make those 
interim steps invisible to the end user.


(With VS14, BabelTags mark-up, or new control character(s)—casing isn’t 
even an issue.)

Re: Encoding colour (from Re: Encoding italic)

2019-02-11 Thread Mark E. Shoulson via Unicode


On 2/11/19 5:46 PM, Kent Karlsson via Unicode wrote:

Continuing too look deep into the crystal ball, doing some more
hand swirls...

...

...

The scheme quoted (far) below (from wjgo_10009), or anything like it,
will NEVER be part of Unicode!


Not in Unicode, but I have to say I'm intrigued by the idea of writing 
HTML with tag characters (not even necessarily "restricted" HTML: the 
whole deal).  This does NOT make it possible to write "italics in plain 
text," since you aren't writing plain text.  But what you can do is 
write rich text (HTML) that Just So Happens to look like plain text when 
rendered with a plain-text-renderer  (and maybe there could be 
plain-text-renderers that straddle the line, maybe supporting some 
limited subset of HTML and doing boldface and italics or something.  
BUT, this would NOT be a Unicode feature/catastrophe at all.  This would 
be purely the decision of the committee in charge of HTML/XML and 
related standards, to decide to accept Unicode tag characters as if they 
were ASCII for the purposes of writing XML tags/attributes   It's 
totally nothing to do with Unicode, unless the XML folks want Unicode to 
change some properties on the tag chars or something.  I think it's a... 
fascinating idea, and probably has *disastrous* consequences lurking 
that I haven't tried to think of yet, but it's not a Unicode idea.


~mark

Re: Encoding italic

2019-02-11 Thread James Kass via Unicode

On 2019-02-11 6:42 PM, Kent Karlsson wrote:

> Using a VS to get italics, or anything like that approach, will
> NEVER be a part of Unicode!

Maybe the crystal ball is jammed.  This can happen, especially on the 
older models which use vacuum tubes.

Wanting a second opinion, I asked the magic 8 ball:
“Will VS14 italic be part of Unicode?”
The answer was:
“It is decidedly so.”

Re: Encoding italic

2019-02-11 Thread Kent Karlsson via Unicode

Den 2019-02-11 10:55, skrev "wjgo_10...@btinternet.com via Unicode"
:

> Doug Ewell wrote:
> 
>> , just as next to nobody is using the proposed VS14 mechanism 
> 
> Well, of course not because use of VS14 in a plain text document to
> record a request for an italic glyph version is not at the present time
> an official part of Unicode.

Looking deeply into the crystal ball, swirling my hands over it...

...

...

Using a VS to get italics, or anything like that approach, will
NEVER be a part of Unicode!

/Kent K

Re: Encoding italic

2019-02-11 Thread wjgo_10...@btinternet.com via Unicode


Doug Ewell wrote:


…, just as next to nobody is using the proposed VS14 mechanism …


Well, of course not because use of VS14 in a plain text document to 
record a request for an italic glyph version is not at the present time 
an official part of Unicode. The next scheduled Unicode Technical 
Committee meeting is due to start on 30 April 2019.


Here is a link to the proposal document.

https://www.unicode.org/L2/L2019/19063-italic-vs.pdf

VS14 is used to indicate a request for an italic glyph version in my 
VS14 Maquette font but that is clearly just a maquette font for 
experimental use to test the concept and show that it works. An 
application program that supports OpenType and that has the liga table 
switched on is needed in order to use the VS14 Maquette font to 
demonstrate that the use of VS14 in this way works.


https://forum.high-logic.com/viewtopic.php?f=10=7831

William Overington

Monday 11 February 2019

Re: Encoding colour (from Re: Encoding italic)

2019-02-11 Thread Philippe Verdy via Unicode

Le dim. 10 févr. 2019 à 02:33, wjgo_10...@btinternet.com via Unicode <
unicode@unicode.org> a écrit :

> Previously I wrote:
>
> > A stateful method, though which might be useful for plain text streams
> > in some applications, would be to encode as characters some of the
> > glyphs for indicating colours and the digit characters to go with them
> > from page 5 and from page 3 of the following publication.
>
> > http://www.users.globalnet.co.uk/~ngo/locse027.pdf
>
> Thinking about this further, for this application copies of the glyphs
> could be redesigned so as to be square and could be emoji-style and the
> meanings of the characters specifying which colour component is to be
> set could be changed so that they refer to the number previously entered
> using one or more  of the special  digit characters. Thus the setting of
> colour components could be done in the same reverse notation way that
> the FORTH computer language works.
>

FORTH is not relevant to this discussion. Anyway the usual order for Forth
operators (Forth is a stack-based language, similar to PostScript, and
working like calculators using the Polish reversed order) is to push the
operands from left to right and then use the operator which will pop them
in reverse order from right to left before pushing the result on the stack
(so "a/b/c" becomes "/a get /b get div /c get div"). But colors are just an
operator like "rgb(r,b,g)" and the natural order in stack based languages
should also be "/r get /g get /b get rgb".
Note that C/C++ (with C calling conventions) usually use another order for
its stack, pushing parameters from right to left (if they are not passed
via dedicated registers in fix order, the first parameter from the right
that fits a register being not passed in the stack but on the "main"
accumulator register, possibly a pair or registers for long integer or long
pointers, or a different register for floatting points if floatting point
registers are used).

There's no standard for the order of parameters in stack based languages.
It is arbitrary and specific to each language or specific implementations
of them. So if you want to create your own scripting language to support
your non-standard extension, you can choose any order you want, but this
will still not define a standard related to other languages that have never
been bound to a specific evaluation/encoding order. Then don't pretend it
will be part of the Unicode standard, which is not a scripting language and
that does not offer an "ABI" for stateful encodings with arbitarily long
contexts (Unicode has placed very low limits on the maximum length of
lookahead needed to process text, your extension would not work under these
reasonnable limits, so it will have limited private use and cannot be part
of TUS).

You may create your "proof of concept" (tested on limited configurations)
but it will just be private

[And so it should use PUA for full compatibility and not abuse the other
standardized code points, as your extension would not be
compatible/conforming to the existing rules and limits, without amending
them and discussing a lot how existing conforming applications can be
adapted, and analyzing the effects if they are not updated. Approving this
extension is another thing, and it will need to pass the standard process
to be added to the proposals schedule, pass through the two technical
comities, pass the alpha and beta phases, and then the prepublication.
You'll also need to work on documentations and fix many quirks found in
them, then you'll need supporters to pass the vote (and if you're not an
UTC member or an ISO member, you will never be able to vote for it: you
need then to convince the voters by listening what they remark and refine
your specifications to match their desires, and probably to split your
proposal in several parts or limit your initial goals, leaving the other
problematic poitns for later; if what remains "stable" in your proposal may
not be usable in practice without the additional extensions still in
discussion, and in fact this subset may still remain in the encoding queue
for years, until it reaches a point where it starts being usable for
practical problems; before that, you'll have to experiment with private-use
and should be ready to accept competing proposals, not compatible with your
proposal, and learn from them to reach an acceptable consensus; reaching
that consensus is the longest step but initially most voters will not
decide for or against your proposal if they are not confident enough about
the merit of each proposal, because they want to preserve a resasonnable
compatibility across TUS versions and with existing applications without
adding further problems, notably in terms of confusability/security. But
don't ask them to break the existing stability rules which were even harder
to formalize: these rules is the foundation that allowed TUS/ISO 10646 to
become a successful worldwide standard with lot of applications using

Re: Encoding italic

2019-02-11 Thread Philippe Verdy via Unicode

Le dim. 10 févr. 2019 à 16:42, James Kass via Unicode 
a écrit :

>
> Philippe Verdy wrote,
>
>  >> ...[one font file having both italic and roman]...
>  > The only case where it happens in real fonts is for the mapping of
>  > Mathematical Symbols which have a distinct encoding for some
>  > variants ...
>
> William Overington made a proof-of-concept font using the VS14 character
> to access the italic glyphs which were, of course, in the same real
> font.  Which means that the developer of a font such as Deja Vu Math TeX
> Gyre could set up an OpenType table mapping the Basic Latin in the font
> to the italic math letter glyphs in the same font using the VS14
> characters.  Such a font would work interoperably on modern systems.
> Such a font would display italic letters both if encoded as math
> alphanumerics or if encoded as ASCII plus VS14.  Significantly, the
> display would be identical.
>
>  > ...[math alphanumerics]...
>  > These were allowed in Unicode because of their specific contextual
>  > use as distinctive symbols from known standards, and not for general
>  > use in human languages
>
> They were encoded for interoperability and round-tripping because they
> existed in character sets such as STIX.  They remain Latin letter form
> variants.  If they had been encoded as the variant forms which
> constitute their essential identity it would have broken the character
> vs. glyph encoding model of that era.  Arguing that they must not be
> used other than for scientific purposes is just so much semantic
> quibbling in order to justify their encoding.
>
> Suppose we started using the double struck ASCII variants on this list
> in order to note Unicode character numbers such as 핌+픽피픽픽 or
> 핌+ퟚퟘퟞퟘ?  Hexadecimal notation is certainly math and Unicode can be
> considered a science.  Would that be “math abuse” if we did it?  (Is
> linguistics not a science?)
>
>  > (because these encodings are defective and don't have the necessary
>  > coverage, notably for the many diacritics,
>
> The combining diacritics would be used.
>
Not for the many precombined characters that are in Latin: do you intend to
propose them to be reencoded with all the same variants encoded for maths?
Or allow the maths symbols to have diacritics added on them (hint: this
does not work correctly with the specific mathematical conventions on
diacritics and their specific stacking rules: they are NOT reorderable
through canonical equivalence, the order is significant in maths, so you
would also need to use CGJ to fix the expected logical semantic and visual
stacking order).

>
>  > case mappings,
>
> Adjust them as needed.
>

Not so easy: case mappings cannot be fixed. They are stabilized in Unicode.
You would need special casing rules under a specific "locale" for maths.

Really maths is a specific script even if it borrows some symbols from
Latin, Greek or Hebrew but only in specific glyph variants. These symbols
should not be even considered as part of the script they originate from
(just like Latin A is not the same as Cyrillic A or Greek Alpha, that all
have the same forms and the same origin).

I can argue tyhe same thing about IPA notations: they are NOT the Latin
script and also borrow some letter forms from Latin and Greek, but without
any case mappings (only lowercase is used), and also with specific glyph
variants.

Both examples are technical notations which do not obey the linguistic
rules and normal processing of the script they originate from. They are
specific "writing systems", unfortunaltely confused within "Unicode
scripts", and then abused.

Note that some Latin letters have been borrowed from IPA too, for use in
African languages, then case mappings were needed: these should have been
reencoded as a plain letter pair with a basic case mapping (not the special
case mapping rules now needed for African languages, such as open o which
looks much like the mirrored c from Latin Roman digits, and open e which
was borrowed from Greek epsilon in lowercase but does not use the uppercase
Greek Epsilon and uses instead another shape, meaning that the Latin open e
should have been encoded as a plain letter pair, distinct from the Greek
epsilon; but IPA already used the epsilon-like symbol...).

At end these exceptions just cause many inconsistancies and complexities.
Applications and libraries cannot adapt easily and are not downward
compatible because stable properties are immutable and specific tailorings
are needed each time in applications: the more we add these exceptions, the
less the standard is easy to adapt and compatibility is much more difficult
to preserve. In summary I don't like at all the dual encodings or encodings
of additional letters that cannot use the normal stable properties (and
this remark is also true for emojis: what a mess ! full of exceptions and
different incoherent encoding models !)

Re: Encoding italic

2019-02-10 Thread Kent Karlsson via Unicode





Den 2019-02-10 16:31, skrev "James Kass via Unicode" :

> 
> Philippe Verdy wrote,
> 
>>> ...[one font file having both italic and roman]...

For OpenType fonts, there is a "design axis" called "ital". Value 0 on that
axis would be roman (upright, normally), and value 1 on that axis would be
italic. I don't know to what extent that is available in OpenType fonts in
common use... (Instead of using two separate font files.)

[math chars]
> They were encoded for interoperability and round-tripping because they
> existed in character sets such as STIX. 

They were basically requested "by" STIX, yes. Not sure about the
round-tripping bit.

> They remain Latin letter form
> variants.  If they had been encoded as the variant forms which
> constitute their essential identity it would have broken the character
> vs. glyph encoding model of that era.  Arguing that they must not be
> used other than for scientific purposes

I don't think that particular argument was made, IIUC.

> is just so much semantic
> quibbling in order to justify their encoding.
> 
> Suppose we started using the double struck ASCII variants on this list
> in order to note Unicode character numbers such as 핌+픽피픽픽 or
> 핌+ퟚퟘퟞퟘ? 

That particular example would be ok (event though outside of a
conventional math formula). But we were talking about natural
languages in their conventional orthography, using italics/bold.

/Kent K

Re: Encoding italic

2019-02-10 Thread Doug Ewell via Unicode

Egmont Koblinger wrote:

> There are a lot of problems with these escape sequences, and if you go
> for a potentially new standard, you might not want to carry these
> problems.

As others have pointed out, I am suggesting the use of some profile of ISO 6429 
within plain text to implement these features about which there is disagreement 
whether they belong in plain text or not.

I am very definitely NOT proposing that anything be added to Unicode or 10646, 
nor that an all-new standard be created.

> There is not a well-defined framework for escape sequences.

I thought ISO 6429 defined things rather clearly, if verbosely.

> In this particular case you might say it starts with ESC [ and ends
> with the letter 'm', but how do you know where to end the sequence if
> that letter 'm' just doesn't arrive?

Well, what do you do in HTML if the closing '>' never arrives?

If it's simply a matter of the text coming to an end before the 'm' arrives, 
then it doesn't matter. If the 'm' (or other final code unit for other 
commands) is dropped but the sequence goes on, like [3This is 
italicized[m, then gosh, I don't know offhand what the standard says. It 
might be worthwhile to try looking it up, or seeing what implementations do, or 
defining it clearly in the profile.

> Terminal emulators have extremely complex tables for parsing (and
> still many of them get plenty of things wrong). It's unreasonable for
> any random small utility processing Unicode text to go into this
> business of recognizing all the well-known escape sequences, not even
> to the extent to know where they end.

Perhaps interestingly, I wrote a random small utility many years ago that 
displayed ISO 6429 sequences on a Windows console, back in the dark ages 
between ANSI.SYS and Windows 10 support for 6429. It didn't cover the entire 
standard, nor could it, but a decent subset. It understood where sequences 
ended, even unknown ones, because that is all laid out in the standard.

> Whatever is designed should be much more easily parseable. Should you
> say "everything from ESC[ to m", you'll cause a whole bunch of
> problems when a different kind of escape sequence gets interpreted as
> Unicode.

I'm afraid I don't understand this statement.

> A parser, by the way, would also have to interpret combined sequences
> like ESC[3;0;1m or alike, for which I don't see a good reason as
> opposed to having separate sequences for each.

That's easy:

3 = turn on italics
0 = turn off all special styling, including italics
1 = turn on bold (or intense, whichever the output device supports)

It's a silly sequence, because why would you turn on an attribute and then 
immediately turn it off before using it? But silly though it may be, it's 
well-formed and very easy to parse. My random small utility had no problem with 
it.

> Also, it should be carefully evaluated what to do with C1 (U+009B)
> instead of the C0 ESC[ opening for an escape sequence – here terminal
> emulators vary. These just make everything even more cumbersome.

Why would they vary? CSI encoded as <1B 5B> or as <9B> is exactly the same. 
Again, this is very clear in the standard.

> ECMA-48 8.3.117 specifies ESC[1m as "bold or increased intensity".
> It's only nowadays that most terminal emulators support 256 colors and
> some even support 16M true colors that some emulators try to push for
> this bit unambiguously meaning "bold" only, whereas in most emulators
> it means "both bold and increased intensity". [...]

Why would we expect every displayed and printed page to look identical? That's 
not going to happen no matter what encoding mechanism you use for "bold" and 
"intense" and the rest. Not all HTML pages look identical either.

> Should this scheme be extended for colors, too? What to do with the
> legacy 8/16 as well as the 256-color extensions wrt. the color
> palette?

Why not?

> Should Unicode go into the business

Nope. Unicode should do nothing about this.

> For 256-colors and truecolors, there are two or three syntaxes out
> there regarding whether the separator is a colon or a semicolon.
> ECMA-48 doesn't say anything about it, TUI T.416 does, although it's
> absolutely not clear. See e.g. the discussion at the comment section
> of https://gist.github.com/XVilka/8346728 , in Dec 2018, we just
> couldn't figure out which syntax exactly TUI T.416 wants to say.

That sounds like someone should send a question to ITU-T. Exegesis would
perhaps be more productive than despair.

> Moreover, due to a common misinterpretation of the spec, one of the
> positional parameters are often omitted.

That's a decision designers and implementers are sometimes faced with: should 
we remain bug-compatible with other implementations, or follow the straight and 
narrow path? I remember browsers going through that era too.

> Some terminal emulators have made up some new SGR modes, e.g. ESC[4:3m
> for curly underline. What to do with them?

Should we be extension-compatible with other

Re: Encoding italic

2019-02-10 Thread James Kass via Unicode




Philippe Verdy wrote,

>> ...[one font file having both italic and roman]...
> The only case where it happens in real fonts is for the mapping of
> Mathematical Symbols which have a distinct encoding for some
> variants ...

William Overington made a proof-of-concept font using the VS14 character 
to access the italic glyphs which were, of course, in the same real 
font.  Which means that the developer of a font such as Deja Vu Math TeX 
Gyre could set up an OpenType table mapping the Basic Latin in the font 
to the italic math letter glyphs in the same font using the VS14 
characters.  Such a font would work interoperably on modern systems.  
Such a font would display italic letters both if encoded as math 
alphanumerics or if encoded as ASCII plus VS14.  Significantly, the 
display would be identical.


> ...[math alphanumerics]...
> These were allowed in Unicode because of their specific contextual
> use as distinctive symbols from known standards, and not for general
> use in human languages

They were encoded for interoperability and round-tripping because they 
existed in character sets such as STIX.  They remain Latin letter form 
variants.  If they had been encoded as the variant forms which 
constitute their essential identity it would have broken the character 
vs. glyph encoding model of that era.  Arguing that they must not be 
used other than for scientific purposes is just so much semantic 
quibbling in order to justify their encoding.


Suppose we started using the double struck ASCII variants on this list 
in order to note Unicode character numbers such as 핌+픽피픽픽 or 
핌+ퟚퟘퟞퟘ?  Hexadecimal notation is certainly math and Unicode can be 
considered a science.  Would that be “math abuse” if we did it?  (Is 
linguistics not a science?)


> (because these encodings are defective and don't have the necessary
> coverage, notably for the many diacritics,

The combining diacritics would be used.

> case mappings,

Adjust them as needed.

> and other linguisitic, segmentation and layout properties).
>
> The same can be said about superscript/subscript variants,
> ... : they have specific use and not made for general purpose texts ...

So people who used ISO-8859-1 were not allowed to use the superscript 
digits therein for marking footnotes?  Those superscript digits were 
reserved by ISO-8859-1 only for use by math and science?


MATHEMATICAL ITALIC CAPITAL A
Decomposition mapping:  U+0041
Binary properties:  Math, Alphabetic, Uppercase, Grapheme Base, ...

SUPERSCRIPT TWO
Decomposition mapping:  U+0032
Binary properties:  Grapheme Base

MODIFIER LETTER SMALL C
Decomposition mapping:  U+0063
Binary properties:  Alphabetic, Lowercase, Grapheme Base, ...

Re: Encoding italic

2019-02-10 Thread Philippe Verdy via Unicode

Le dim. 10 févr. 2019 à 05:34, James Kass via Unicode 
a écrit :

>
> Martin J. Dürst wrote,
>
>  >> Isn't that already the case if one uses variation sequences to choose
>  >> between Chinese and Japanese glyphs?
>  >
>  > Well, not necessarily. There's nothing prohibiting a font that includes
>  > both Chinese and Japanese glyph variants.
>
> Just as there’s nothing prohibiting a single font file from including
> both roman and italic variants of Latin characters.
>

May be but such a fint would not work as intended to display both styles
distinctly with the common use of the italic style: it would have to make a
default choice and you would then need either a special text encoding, or
enabling an OpenType feature (if using OpenType font format) to select the
other style in a non-standard custom way.

The only case where it happens in real fonts is for the mapping of
Mathematical Symbols which have a distinct encoding for some variants (only
for a basic subset of the Latin alphabet, as well as some basic Greek and a
few other letters from other scripts), and this is typically done only in
symbol fonts containing other mathametical symbols, but because of the
specific encoding for such mathematical use. As well we have the variants
registered in Unicode for IPA usage (only lowercase letters, treated as
symbols and not case-paired).

These were allowed in Unicode because of their specific contextual use as
distinctive symbols from known standards, and not for general use in human
languages (because these encodings are defective and don't have the
necessary coverage, notably for the many diacritics, case mappings, and
other linguisitic, segmentation and layout properties).

The same can be said about superscript/subscript variants, bold variants,
monospace variants: they have specific use and not made for general purpose
texts in human languages with their common orthographic conventions: Latin
is a large script and one of the most complex, and it's quite normal that
there are some deviating usages for specific purposes, provided they are
bounded in scope and use.

But what you would like is to extend the whole Latin script (and why not
Greek, Cyrillic, and others) with multiple reencodings for lot of stylistic
variants, and each time a new character or diacritic is encoded it would
have to be encoded multiple times (so you'd break the encoding character
model, and would just complicate the implementation even more, and would
also create new security issues with lot of new confusables, that every
user of Unicode would then have to take into account, and evey application
or library would then need to be updated, and have to include large
datatables to handle them).

As well it would create many conflicts if we used the "VARIATION SELECTOR
n" characters, or would need to permanently assign specific ones for
specific styles; and then rapidly we would no longer have enough "VARIATION
SELECTOR n" selectors in Unicode : we only have 256 of them, only one is
more or less permanently dedicated.

[VS16 is almos compeltely reserved now for distinction between
normal/linguisitic and emoji/colorful variants. The emoji subset in Unicode
is an open set which could expand in the future to tens of thousands
symbols, and will likely cause large work overhaed in CLDR project just to
describe them, one reason for which I think that Emoji character data in
CLDR should be separated in a distinct translation project, with its own
versioning and milestones, and not maintained in sync with the rest of CLDR
data, if we consider how emojis have flooded the CLDR survey discussions,
when this subset has many known issues and inconsistencies and still no
viable encoding model like the "character encoding model" to make it more
consistant, and updatable separately from the rest of the Unicode UCD
releases; in my opinion the emojis in Unicode are still an alpha project in
development and it's too soon to describe them as a "standard" when there
are many other possible way to handle them; these emeojis are just there
now to remlain as "legacy" mappings but won't resist an expected coming new
formal standard about them insterad of the current mess they create now.]

Re: Encoding italic

2019-02-10 Thread Rebecca Bettencourt via Unicode

On Sat, Feb 9, 2019 at 6:23 AM Richard Wordingham via Unicode <
unicode@unicode.org> wrote:

> On Sat, 9 Feb 2019 04:52:30 -0800
> David Starner via Unicode  wrote:
>
> > Note that this is actually the only thing that stands out to me in
> > Unicode not supporting older character sets; in PETSCII (Commodore
> > 64), the high-bit character characters were the reverse (in this
> > sense) of the low-bit characters.
>
> Later ISCII has some styling codes, bold and italic amongst them.
>

Interesting.

I found the 1991 ISCII spec: http://varamozhi.sourceforge.net/iscii91.pdf

The styling codes are:

EF 30 - Bold
EF 31 - Italic
EF 32 - Underline
EF 33 - Double Width
EF 34 - Highlight
EF 35 - Outline
EF 36 - Shadow
EF 37 - Double Height, Top Half
EF 38 - Double Height, Bottom Half
EF 39 - Double Height & Double Width

There are also codes for switching scripts (Roman, Devanagari, Bengali,
Tamil, Arabic, Persian, etc.) but these are not necessary since Unicode
encodes these separately.

These take effect "till the end of a line, or till the same attribute [code
is encountered]." In other words, these just toggle the attribute, and all
the attributes are reset when a newline is encountered.

Re: Encoding italic

2019-02-09 Thread James Kass via Unicode




Martin J. Dürst wrote,

>> Isn't that already the case if one uses variation sequences to choose
>> between Chinese and Japanese glyphs?
>
> Well, not necessarily. There's nothing prohibiting a font that includes
> both Chinese and Japanese glyph variants.

Just as there’s nothing prohibiting a single font file from including 
both roman and italic variants of Latin characters.

Re: Encoding italic

2019-02-09 Thread Martin J . Dürst via Unicode

On 2019/02/09 19:58, Richard Wordingham via Unicode wrote:
> On Fri, 8 Feb 2019 18:08:34 -0800
> Asmus Freytag via Unicode  wrote:

>> Under the implicit assumptions bandied about here, the VS approach
>> thus reveals itself as a true rich-text solution (font switching)
>> albeit realized with pseudo coding rather than markup, markdown or
>> escape sequences.
> 
> Isn't that already the case if one uses variation sequences to choose
> between Chinese and Japanese glyphs?

Well, not necessarily. There's nothing prohibiting a font that includes 
both Chinese and Japanese glyph variants.

Regards,   Martin.

Encoding colour (from Re: Encoding italic)

2019-02-09 Thread wjgo_10...@btinternet.com via Unicode


Egmont Koblinger wrote:


Should this scheme be extended for colors, too? What to do with the

legacy 8/16 as well as the 256-color extensions wrt. the color
palette? Should Unicode go into the business of defining a fixed set
of colors, or allow to alter the palette colors using the OSC 4 and
friends escape sequences which supported by about half of the terminal
emulators out there?

Encoding colour is already a topic in relation to emoji and maybe could 
be extended to other characters.


A stateful method, though which might be useful for plain text streams 
in some applications, would be to encode as characters some of the 
glyphs for indicating colours and the digit characters to go with them 
from page 5 and from page 3 of the following publication.


http://www.users.globalnet.co.uk/~ngo/locse027.pdf

What to do with things that Unicode might also want to have, but 
doesn't exist in terminal emulators due to their nature, such as

switching to a different font size?

Well, if people were to want to do it, there could be a character 
encoded in the Specials section and then use that character as a base 
character and follow it with a sequence of tag characters.


William Overington

Saturday 9 February 2019

Re: Encoding colour (from Re: Encoding italic)

2019-02-09 Thread wjgo_10...@btinternet.com via Unicode


Previously I wrote:

A stateful method, though which might be useful for plain text streams 
in some applications, would be to encode as characters some of the 
glyphs for indicating colours and the digit characters to go with them 
from page 5 and from page 3 of the following publication.



http://www.users.globalnet.co.uk/~ngo/locse027.pdf


Thinking about this further, for this application copies of the glyphs 
could be redesigned so as to be square and could be emoji-style and the 
meanings of the characters specifying which colour component is to be 
set could be changed so that they refer to the number previously entered 
using one or more  of the special  digit characters. Thus the setting of 
colour components could be done in the same reverse notation way that 
the FORTH computer language works. Yet although the colour components 
thus set would be stateful until changed there would be no Escape 
sequence and if an application did not support interpretation of the 
characters as setting colours, they would just be displayed as glyphs, 
each either as a particular glyph or as a .notdef glyph.


William Overington
Saturday 9 February 2019

Re: Encoding italic

2019-02-09 Thread Richard Wordingham via Unicode

On Sat, 9 Feb 2019 04:52:30 -0800
David Starner via Unicode  wrote:

> Note that this is actually the only thing that stands out to me in
> Unicode not supporting older character sets; in PETSCII (Commodore
> 64), the high-bit character characters were the reverse (in this
> sense) of the low-bit characters.

Later ISCII has some styling codes, bold and italic amongst them.

Richard.

Re: Encoding italic

2019-02-09 Thread Rebecca Bettencourt via Unicode

On Sat, Feb 9, 2019 at 4:58 AM David Starner via Unicode <
unicode@unicode.org> wrote:

>
> On Sat, Feb 9, 2019 at 3:59 AM Kent Karlsson via Unicode <
> unicode@unicode.org> wrote:
>
>>
>> Den 2019-02-08 21:53, skrev "Doug Ewell via Unicode" > >:
>> > • Reverse on: ESC [7m
>> > • Reverse off: ESC [27m
>>
>> "Reverse" = "switch background and foreground colours".
>>
>> This is an (odd) colour thing. If you want to go with (full!) colour
>> (foreground and background), fine, but the "reverse" is oddball (and
>> based on what really old terminals were limited to when it comes to
>> colour).
>>
>
> Note that this is actually the only thing that stands out to me in Unicode
> not supporting older character sets; in PETSCII (Commodore 64), the
> high-bit character characters were the reverse (in this sense) of the
> low-bit characters.
>

This is true, many legacy character sets encoded reverse-video characters
as wholly-separate characters, and even allowed them in contexts widely
considered plain-text such as file names. This makes reverse-video possibly
the one text attribute best argued to be worthy of encoding in Unicode. But
I can already tell you it won't work, because we made such an argument in
an early version of L2/19-025, and even proposed using VS14, the very same
VS William Overington has since swiped from us for italics. That proposal
was shot down rather quickly. Bold, italics, etc. don't even stand a chance.

Re: Encoding italic

2019-02-09 Thread David Starner via Unicode

On Sat, Feb 9, 2019 at 3:59 AM Kent Karlsson via Unicode <
unicode@unicode.org> wrote:

>
> Den 2019-02-08 21:53, skrev "Doug Ewell via Unicode"  >:
> > • Reverse on: ESC [7m
> > • Reverse off: ESC [27m
>
> "Reverse" = "switch background and foreground colours".
>
> This is an (odd) colour thing. If you want to go with (full!) colour
> (foreground and background), fine, but the "reverse" is oddball (and
> based on what really old terminals were limited to when it comes to
> colour).
>

Note that this is actually the only thing that stands out to me in Unicode
not supporting older character sets; in PETSCII (Commodore 64), the
high-bit character characters were the reverse (in this sense) of the
low-bit characters.

Re: Encoding italic

2019-02-09 Thread Kent Karlsson via Unicode


Den 2019-02-08 21:53, skrev "Doug Ewell via Unicode" :

> I'd like to propose encoding italics and similar display attributes in
> plain text using the following stateful mechanism:

Note that these do NOT nest (no stack...), just state changes for the
relevant PART of the "graphic" (i.e. style) state. So the approach in
that regard is quite different from the approach done in HTML/CSS.

>  Italics on: ESC [3m
>  Italics off: ESC [23m
>  Bold on: ESC [1m
>  Bold off: ESC [22m
>  Underline on: ESC [4m
(implies turning double underline off)

   Underline, double: ESC [21m
(implies turning single underline off)

>  Underline off: ESC [24m
>  Strikethrough on: ESC [9m
>  Strikethrough off: ESC [29m
>  Reverse on: ESC [7m
>  Reverse off: ESC [27m

"Reverse" = "switch background and foreground colours".

This is an (odd) colour thing. If you want to go with (full!) colour
(foreground and background), fine, but the "reverse" is oddball (and
based on what really old terminals were limited to when it comes to colour).

I'd rather include 'ESC [50m' (not variable spacing, i.e. "monospace" font)
and 'ESC [26m' (variable spacing, i.e. "proportional" font). Recall that
this is NOT for terminal emulators but for styling applied to text
outside of terminal emulators. (Terminal emulators already implement
much of this and more; albeit sometimes wrongly). This would be handy
for including (say) programming code or computer commands (or for that
matter, "ASCII art", or more generally "Unicode art") in otherwise
"ordinary"
text... (The "ordinary" text preferably set in a proportional font.)

>  Reset all attributes: ESC [m

(Actually 'ESC [0m', with the 0 default-able.) Handy, agreed, but not 100%
necessary.
These ESC-sequences should not normally be inserted "manually" but by a text
editor program, using the conventional means of "making bold" etc. (ctrl-b,
cmd-b,
"bold" in a menu); only "hackers" (in the positive sense) would actually
bother
about the command sequences as such.

/Kent K


> where ESC is U+001B.
>  
> This mechanism has existed for around 40 years and is already supported
> as widely as any new Unicode-only convention will ever be.
>  
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>  
>

Re: Encoding italic

2019-02-09 Thread Kent Karlsson via Unicode



Den 2019-02-08 22:29, skrev "Egmont Koblinger via Unicode"
:

> (Mind you, I don't find it a good idea to add italic and whatnot
> formatting support to Unicode at all... but let's put aside that now.)

I don't think Doug mean to "add it to the Unicode standard", just to
have a summary of "handy esc-sequences (actually command-sequences)
for simple styling of text" picked from long-standing (text level...)
standards.

> There are a lot of problems with these escape sequences, and if you go
> for a potentially new standard, you might not want to carry these
> problems.
> 
> There is not a well-defined framework for escape sequences. In this
> particular case you might say it starts with ESC [ and ends with the
> letter 'm', but how do you know where to end the sequence if that
> letter 'm' just doesn't arrive? Terminal emulators have extremely

There is an overriding "basic (overall) syntax" for esc-seq/
command-sequences that do not include a string argument (like OSC,
APC, ...). IIUC it is (originally as byte sequences, but here as
character sequences):

\u001B[\u0020-\002F]*[\u0030-\007E]| 
(\u001B'['|\009B)[\u0030-\003F]*[\u0020-\002F]*[\u0040-\007E] 

(no newline or carriage return in there). True, that has no direct
limit, but it would not be unreasonable to set a limit of (say)
max 30 characters. Potential (i.e. starting with ESC) esc-"sequences"
that do not match the overall syntax or are too long can simply be
rendered as is (except for the ESC itself). The esc/command sequences
(that match) but are not interpreted should be ignored in "normal"
(not "show invisibles" mode) display.

They are unlikely to be "default ignored" by such things as sorting
(and should preferably be filtered out beforehand, if possible). But
if we compare to other rich text editors, the command sequences should
be ignored by (interactive) searching, just like HTML tags are ignored
in interactive searching (the internal representation "skipping" the
HTML tags in one way or another). HTML tags should also (when text
known to be HTLM) filtered out before doing such things as sorting.

> complex tables for parsing (and still many of them get plenty of
> things wrong). It's unreasonable for any random small utility
> processing Unicode text to go into this business of recognizing all
> the well-known escape sequences, not even to the extent to know where
> they end. Whatever is designed should be much more easily parseable.
> Should you say "everything from ESC[ to m", you'll cause a whole bunch
> of problems when a different kind of escape sequence gets interpreted
> as Unicode.

The escape/command sequences would not be part of Unicode (standard).

> A parser, by the way, would also have to interpret combined sequences
> like ESC[3;0;1m or alike, for which I don't see a good reason as
> opposed to having separate sequences for each. Also, it should be

Formally covered by the (non-Unicode) standards, but optional (IIUC).

> carefully evaluated what to do with C1 (U+009B) instead of the C0 ESC[
> opening for an escape sequence  here terminal emulators vary. These
> just make everything even more cumbersome.
> 
> ECMA-48 8.3.117 specifies ESC[1m as "bold or increased intensity".

I think one should interpret these in a "modern" way, not looking
too much at what old terminals were limited to. (Colour ("increased
intensity") should be handled completely separately from bold.)

> Should this scheme be extended for colors, too? What to do with the
> legacy 8/16 as well as the 256-color extensions wrt. the color
> palette? Should Unicode go into the business of defining a fixed set
> of colors, or allow to alter the palette colors using the OSC 4 and
> friends escape sequences which supported by about half of the terminal
> emulators out there?

IF extending to colour, only refer to "true colour" (RGB) command-sequence.
The colour palette versions are for the limitations of (semi-)old terminals.

> For 256-colors and truecolors, there are two or three syntaxes out
> there regarding whether the separator is a colon or a semicolon.

It can only be colon. Using semicolon would interfere with the syntax
for multiple style specifications in one command sequence. (I by mistake
wrote a semicolon there in an earlier post; sorry.)

> Some terminal emulators have made up some new SGR modes, e.g. ESC[4:3m
> for curly underline. What to do with them? Where to draw the line what

(Note colon, not semicolon, as separator.) Possible, partially matching
the capabilities for underlining via CSS (solid, dotted, dashed, wavy,
double). Depends on how much styling options one wants to pick up.

> to add to Unicode and what not to? Will Unicode possibly be a

I don't think anyone wants to make this part of the Unicode standard.
(A the most a Unicode technical note...; from Unicode's point of view.)

[...] 
> What to do with things that Unicode might also want to have, but
> doesn't exist in terminal emulators due to their nature, such as
> switching

Re: Encoding italic

2019-02-09 Thread Richard Wordingham via Unicode

On Fri, 8 Feb 2019 18:08:34 -0800
Asmus Freytag via Unicode  wrote:

> On 2/8/2019 5:42 PM, James Kass via Unicode wrote:

> You are still making the assumption that selecting a different glyph
> for the base character would automatically lead to the selection of a
> different glyph for the combining mark that follows. That's an iffy
> assumption because "italics" can be realized by choosing a separate
> font (typographically, italics is realized as a separate typeface).

The usual practice is to look for a font that supports both base
character and mark.

> Under the implicit assumptions bandied about here, the VS approach
> thus reveals itself as a true rich-text solution (font switching)
> albeit realized with pseudo coding rather than markup, markdown or
> escape sequences.

Isn't that already the case if one uses variation sequences to choose
between Chinese and Japanese glyphs?

>> Of course, the user might insert VS14s without application
>> assistance.  In which case hopefully the user knows the rules.  The
>> worst case scenario is where the user might insert a VS14 after a
>> non-base character, in which case it should simply be ignored by any
>> application.  It should never “break” the display or the processing;
>> it simply makes the text for that document non-conformant.  (Of
>> course putting a VS14 after “ê” should not result in an italicized
>> “ê”.)

Is there any obligation on applications to ignore it?  In plain text,
the Unicode rules allow the application to choose to render every third
'ê' as italic.  Possibly it comes down to the mens rea of the
application (or of its coder or specifier), but without mentalism an
application could opt to treat <ê, VS14> as .

A relevant concern would be 'voracious' with the first 'o'
italicised by VS14.  How would current typeface selection logic work?
I can envisage  only being in the cmap of an italic font.

Richard.

Re: Encoding italic

2019-02-08 Thread James Kass via Unicode




Asmus Freytag wrote,

> You are still making the assumption that selecting a different glyph for
> the base character would automatically lead to the selection of a 
different

> glyph for the combining mark that follows. That's an iffy assumption
> because "italics" can be realized by choosing a separate font 
(typographically,

> italics is realized as a separate typeface).
>
> There's no such assumption built into the definition of a VS. At 
best, inside
> the same font, there may be an implied ligature, but that does not 
work if

> there's an underlying font switch.

Midstream font switching isn’t a user option in most plain-text 
applications, although there can be some font substitution happening at 
the OS level.  Any combining mark must apply to its base letter glyph, 
even after a base letter glyph has been modified.


More sophisticated editors, like BabelPad, allow users to select 
different fonts for different ranges of Unicode.  If a user selects font 
X for ASCII and font Y for combining marks, then mark positioning is 
already broken.


If the user selects Times New Roman for both ASCII and combining marks, 
then no font switching is involved.  The Times New Roman type face 
includes italic letter form variants.  Any application sharp enough to 
know that the italic letter form variants are stored in a different 
computer *file* should be clever enough to apply mark positioning 
accordingly.  And any single font file which includes italic letters and 
maps them with VS14 would avoid any such issues altogether.

Re: Encoding italic

2019-02-08 Thread Asmus Freytag via Unicode


  
  
On 2/8/2019 5:42 PM, James Kass via
  Unicode wrote:


  
  William,
  
  
  Rather than having the user insert the VS14 after every character,
  the editor might allow the user to select a span of text for
  italicization.  Then it would be up to the editor/app to insert
  the VS14s where appropriate.
  
  
  For Andrew’s example of “fête”, the user would either type the
  string:
  
  “f” + “ê” + “t” + “e”
  
  or the string:
  
  “f” + “e” +  + “t” +
  “e”.
  
  
  If the latter, the application would insert VS14 characters after
  the “f”, “e”, “t”, and “e”.  The application would not insert a
  VS14 after the combining circumflex — because the specification
  does not allow VS characters after combining marks, they may only
  be used on base characters.
  
  
  In the first ‘spelling’, since the specifications forbid VS
  characters after any character which is not a base character (in
  other words, not after any character which has a decomposition,
  such as “ê”) — the application would first need to convert the
  string to the second ‘spelling’, and proceed as above.  This is
  known as converting to NFD.
  
  
  So in order for VS14 to be a viable approach, any application
  would ① need to convert any selected span to NFD, and ② only
  insert VS14 after each base character.  And those are two
  operations which are quite possible, although they do add slightly
  to the programmer’s burden.  I don’t think it’s a “deal-killer”.
  



You are still making the assumption that selecting a different
  glyph for the base character would automatically lead to the
  selection of a different glyph for the combining mark that
  follows. That's an iffy assumption because "italics" can be
  realized by choosing a separate font (typographically, italics is
  realized as a separate typeface).
There's no such assumption built into the definition of a VS. At
  best, inside the same font, there may be an implied ligature, but
  that does not work if there's an underlying font switch.
Under the implicit assumptions bandied about here, the VS
  approach thus reveals itself as a true rich-text solution (font
  switching) albeit realized with pseudo coding rather than markup,
  markdown or escape sequences.
It's definitely no more "plain text" than HTML source code.

A./


  
  Of course, the user might insert VS14s without application
  assistance.  In which case hopefully the user knows the rules. 
  The worst case scenario is where the user might insert a VS14
  after a non-base character, in which case it should simply be
  ignored by any application.  It should never “break” the display
  or the processing; it simply makes the text for that document
  non-conformant.  (Of course putting a VS14 after “ê” should not
  result in an italicized “ê”.)
  
  
  Cheers,
  
  
  James

Re: Encoding italic

2019-02-08 Thread James Kass via Unicode




William,

Rather than having the user insert the VS14 after every character, the 
editor might allow the user to select a span of text for italicization.  
Then it would be up to the editor/app to insert the VS14s where appropriate.


For Andrew’s example of “fête”, the user would either type the string:
“f” + “ê” + “t” + “e”
or the string:
“f” + “e” +  + “t” + “e”.

If the latter, the application would insert VS14 characters after the 
“f”, “e”, “t”, and “e”.  The application would not insert a VS14 after 
the combining circumflex — because the specification does not allow VS 
characters after combining marks, they may only be used on base characters.


In the first ‘spelling’, since the specifications forbid VS characters 
after any character which is not a base character (in other words, not 
after any character which has a decomposition, such as “ê”) — the 
application would first need to convert the string to the second 
‘spelling’, and proceed as above.  This is known as converting to NFD.


So in order for VS14 to be a viable approach, any application would ① 
need to convert any selected span to NFD, and ② only insert VS14 after 
each base character.  And those are two operations which are quite 
possible, although they do add slightly to the programmer’s burden.  I 
don’t think it’s a “deal-killer”.


Of course, the user might insert VS14s without application assistance.  
In which case hopefully the user knows the rules.  The worst case 
scenario is where the user might insert a VS14 after a non-base 
character, in which case it should simply be ignored by any 
application.  It should never “break” the display or the processing; it 
simply makes the text for that document non-conformant.  (Of course 
putting a VS14 after “ê” should not result in an italicized “ê”.)


Cheers,

James

Re: Encoding italic

2019-02-08 Thread Richard Wordingham via Unicode

On Fri, 8 Feb 2019 14:26:28 -0800
Asmus Freytag via Unicode  wrote:

> On 2/8/2019 2:08 PM, Richard Wordingham via Unicode wrote:
> On Fri, 8 Feb 2019 17:16:09 + (GMT)
> "wjgo_10...@btinternet.com via Unicode"  wrote:
> 
> Andrew West wrote:
> 
> Just reminding you that "The initial character in a variation
> sequence  
> is never a nonspacing combining mark (gc=Mn) or a canonical
> decomposable character" (The Unicode Standard 11.0 §23.4).
> 
> Hopefully the issue that Andrew mentions can be resolved in some way.
> 
> This is not a problem.  Instead of writing <ê, VS14>, one just writes
> .
> 
> And  introducing yet another convention, which is that combining
> marks inherit the font of the base character.
> 
> Remember, italics, even though presented as a boolean attribute in
> most UIs is in fact typographically a font selection.

Wouldn't  be the base character for the selection of the
font?

Richard.

Re: Encoding italic

2019-02-08 Thread Asmus Freytag via Unicode


  
  
On 2/8/2019 2:08 PM, Richard Wordingham
  via Unicode wrote:


  On Fri, 8 Feb 2019 17:16:09 + (GMT)
"wjgo_10...@btinternet.com via Unicode"  wrote:


  
Andrew West wrote:

  
  

  

  Just reminding you that "The initial character in a variation
sequence  
is never a nonspacing combining mark (gc=Mn) or a canonical
decomposable character" (The Unicode Standard 11.0 §23.4).


  
  

  
Hopefully the issue that Andrew mentions can be resolved in some way.

  
  
This is not a problem.  Instead of writing <ê, VS14>, one just writes
.

And  introducing yet another convention, which is that
  combining marks inherit the font of the base character.
Remember, italics, even though presented as a boolean attribute
  in most UIs is in fact typographically a font selection.

A./




  

Richard.

Re: Encoding italic

2019-02-08 Thread Richard Wordingham via Unicode

On Fri, 8 Feb 2019 22:29:57 +0100
Egmont Koblinger via Unicode  wrote:

> Some terminal emulators have made up some new SGR modes, e.g. ESC[4:3m
> for curly underline. What to do with them? Where to draw the line what
> to add to Unicode and what not to? Will Unicode possibly be a
> bottleneck of further improvements in terminal emulators, because from
> now on every new mode we figure out we'd like to have in terminals
> should go through some Unicode committee? And what if Unicode wants to
> have a mode that terminal emulators aren't interested in, who will
> assign numbers to them that don't clash with terminals? Who will
> somehow keep the two worlds in sync?

Escape sequences are outside the scope of Unicode.  They are part of a
higher level protocol (TUS 23.1 'Control codes').

Richard.

Re: Encoding italic

2019-02-08 Thread Richard Wordingham via Unicode

On Fri, 8 Feb 2019 17:16:09 + (GMT)
"wjgo_10...@btinternet.com via Unicode"  wrote:

> Andrew West wrote:

>> Just reminding you that "The initial character in a variation
>> sequence  
>> is never a nonspacing combining mark (gc=Mn) or a canonical
>> decomposable character" (The Unicode Standard 11.0 §23.4).

> Hopefully the issue that Andrew mentions can be resolved in some way.

This is not a problem.  Instead of writing <ê, VS14>, one just writes
.

Richard.

Re: Encoding italic

2019-02-08 Thread Egmont Koblinger via Unicode

Hi guys,

Having been a terminal emulator developer for some years now, I have
to say – perhaps surprisingly – that I don't fancy the idea of reusing
escape sequences of the terminal world.

(Mind you, I don't find it a good idea to add italic and whatnot
formatting support to Unicode at all... but let's put aside that now.)

There are a lot of problems with these escape sequences, and if you go
for a potentially new standard, you might not want to carry these
problems.

There is not a well-defined framework for escape sequences. In this
particular case you might say it starts with ESC [ and ends with the
letter 'm', but how do you know where to end the sequence if that
letter 'm' just doesn't arrive? Terminal emulators have extremely
complex tables for parsing (and still many of them get plenty of
things wrong). It's unreasonable for any random small utility
processing Unicode text to go into this business of recognizing all
the well-known escape sequences, not even to the extent to know where
they end. Whatever is designed should be much more easily parseable.
Should you say "everything from ESC[ to m", you'll cause a whole bunch
of problems when a different kind of escape sequence gets interpreted
as Unicode.

A parser, by the way, would also have to interpret combined sequences
like ESC[3;0;1m or alike, for which I don't see a good reason as
opposed to having separate sequences for each. Also, it should be
carefully evaluated what to do with C1 (U+009B) instead of the C0 ESC[
opening for an escape sequence – here terminal emulators vary. These
just make everything even more cumbersome.

ECMA-48 8.3.117 specifies ESC[1m as "bold or increased intensity".
It's only nowadays that most terminal emulators support 256 colors and
some even support 16M true colors that some emulators try to push for
this bit unambiguously meaning "bold" only, whereas in most emulators
it means "both bold and increased intensity". Because of compatibility
reason, it won't be a smooth switch. Note that "bold" and "increased
intensity" only go in the same direction with white-on-black color
scheme, with black-on-white bold stands out more while increased
intensity (a lighter shade of gray instead of black) stands out less.
(We could also start nitpicking that the spec doesn't even say that
increased intensity is just for the foreground and not for the
background too.)

Should this scheme be extended for colors, too? What to do with the
legacy 8/16 as well as the 256-color extensions wrt. the color
palette? Should Unicode go into the business of defining a fixed set
of colors, or allow to alter the palette colors using the OSC 4 and
friends escape sequences which supported by about half of the terminal
emulators out there?

For 256-colors and truecolors, there are two or three syntaxes out
there regarding whether the separator is a colon or a semicolon.
ECMA-48 doesn't say anything about it, TUI T.416 does, although it's
absolutely not clear. See e.g. the discussion at the comment section
of https://gist.github.com/XVilka/8346728 , in Dec 2018, we just
couldn't figure out which syntax exactly TUI T.416 wants to say.
Moreover, due to a common misinterpretation of the spec, one of the
positional parameters are often omitted.

Some terminal emulators have made up some new SGR modes, e.g. ESC[4:3m
for curly underline. What to do with them? Where to draw the line what
to add to Unicode and what not to? Will Unicode possibly be a
bottleneck of further improvements in terminal emulators, because from
now on every new mode we figure out we'd like to have in terminals
should go through some Unicode committee? And what if Unicode wants to
have a mode that terminal emulators aren't interested in, who will
assign numbers to them that don't clash with terminals? Who will
somehow keep the two worlds in sync?

What to do with things that Unicode might also want to have, but
doesn't exist in terminal emulators due to their nature, such as
switching to a different font size?

> This mechanism [...] is already supported
> as widely as any new Unicode-only convention will ever be.

I truly doubt this, these escape sequences are specific to terminal
emulation, an extremely narrow subset of where Unicode is used and
rich text might be desired.

I see it a much more viable approach if Unicode goes for something
brand new, something clean, easily parseable, and it remains the job
of specific applications to serve as a bridge between the two worlds.
Or, if it wants to adopt some already existing technology, I find
HTML/CSS a much better starting point.

regards,
egmont

On Fri, Feb 8, 2019 at 9:55 PM Doug Ewell via Unicode
 wrote:
>
> I'd like to propose encoding italics and similar display attributes in
> plain text using the following stateful mechanism:
>
> •   Italics on: ESC [3m
> •   Italics off: ESC [23m
> •   Bold on: ESC [1m
> •   Bold off: ESC [22m
> •   Underline on: ESC [4m
> •   Underline off: ESC [24m
> •

Re: Encoding italic

2019-02-08 Thread Rebecca Bettencourt via Unicode

+∞

-- Rebecca Bettencourt


On Fri, Feb 8, 2019 at 12:55 PM Doug Ewell via Unicode 
wrote:

> I'd like to propose encoding italics and similar display attributes in
> plain text using the following stateful mechanism:
>
> •   Italics on: ESC [3m
> •   Italics off: ESC [23m
> •   Bold on: ESC [1m
> •   Bold off: ESC [22m
> •   Underline on: ESC [4m
> •   Underline off: ESC [24m
> •   Strikethrough on: ESC [9m
> •   Strikethrough off: ESC [29m
> •   Reverse on: ESC [7m
> •   Reverse off: ESC [27m
> •   Reset all attributes: ESC [m
>
> where ESC is U+001B.
>
> This mechanism has existed for around 40 years and is already supported
> as widely as any new Unicode-only convention will ever be.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>
>

Re: Encoding italic

2019-02-08 Thread Doug Ewell via Unicode

I'd like to propose encoding italics and similar display attributes in
plain text using the following stateful mechanism:
 
•   Italics on: ESC [3m
•   Italics off: ESC [23m
•   Bold on: ESC [1m
•   Bold off: ESC [22m
•   Underline on: ESC [4m
•   Underline off: ESC [24m
•   Strikethrough on: ESC [9m
•   Strikethrough off: ESC [29m
•   Reverse on: ESC [7m
•   Reverse off: ESC [27m
•   Reset all attributes: ESC [m
 
where ESC is U+001B.
 
This mechanism has existed for around 40 years and is already supported
as widely as any new Unicode-only convention will ever be.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Encoding italic

2019-02-08 Thread wjgo_10...@btinternet.com via Unicode


Andrew West wrote:


Just reminding you that "The initial character in a variation sequence

is never a nonspacing combining mark (gc=Mn) or a canonical
decomposable character" (The Unicode Standard 11.0 §23.4). This means
that a variation sequence cannot be defined for any precomposed
letters and diacritics, so for example you could not italicize the
word "fête" by simply adding VS14 after each letter because "ê" (in
NFC form) cannot act as the base for a variation sequence. You would
have to first convert any text to be italicized to NFD, then apply
VS14 to each non-combining character. This alone would make a VS
solution unacceptable in my opinion.

As it happens I was not aware of that before, and in fact I had already 
produced a PDF document for submission to the Unicode Technical 
Committee when I read your post.


https://www.unicode.org/L2/L2019/19063-italic-vs.pdf

So, it is an issue that needs to be resolved.

I am a researcher and I am looking for the best way to do this so as to 
get a good result that people can use, I am not trying to assert that my 
suggestion is necessarily the best way to do it. For example, I accepted 
the suggestion that James made.  The meeting of the Unicode Technical 
Committee is not due until April and hopefully some other people will 
send in documents and comments on the topic.


Hopefully the issue that Andrew mentions can be resolved in some way.

William Overington
Friday 8 February 2019

Re: Encoding italic

2019-02-05 Thread Richard Wordingham via Unicode

On Tue, 5 Feb 2019 16:01:41 +
Andrew West via Unicode  wrote:

> You would
> have to first convert any text to be italicized to NFD, then apply
> VS14 to each non-combining character. This alone would make a VS
> solution unacceptable in my opinion.

What is so unacceptable about having to do this?

Richard.

Re: Encoding italic

2019-02-05 Thread Andrew West via Unicode

On Tue, 5 Feb 2019 at 15:34, wjgo_10...@btinternet.com via Unicode
 wrote:
>
> italic version of a glyph in plain text, including a suggestion of to
> which characters it could apply, would test whether such a proposal
> would be accepted to go into the Document Register for the Unicode
> Technical Committee to consider or just be deemed out of scope and
> rejected and not considered by the Unicode Technical Committee.

Just reminding you that "The initial character in a variation sequence
is never a nonspacing combining mark (gc=Mn) or a canonical
decomposable character" (The Unicode Standard 11.0 §23.4). This means
that a variation sequence cannot be defined for any precomposed
letters and diacritics, so for example you could not italicize the
word "fête" by simply adding VS14 after each letter because "ê" (in
NFC form) cannot act as the base for a variation sequence. You would
have to first convert any text to be italicized to NFD, then apply
VS14 to each non-combining character. This alone would make a VS
solution unacceptable in my opinion.

Andrew

Re: Encoding italic

2019-02-05 Thread wjgo_10...@btinternet.com via Unicode


James Kass wrote:

William’s suggestion of floating a proposal for handling italics with 
VS14 might be an example of the old saying about “putting the cart 
before the horse”.


Well, a proposal just about using VS14 to indicate a request for an 
italic version of a glyph in plain text, including a suggestion of to 
which characters it could apply, would test whether such a proposal 
would be accepted to go into the Document Register for the Unicode 
Technical Committee to consider or just be deemed out of scope and 
rejected and not considered by the Unicode Technical Committee.


If the proposal were allowed to become included in the Document Register 
of the Unicode Technical Committee then if other people wish to submit 
comments and other proposals then that would be possible as it would 
have become established that such a topic is deemed acceptable for 
placing into the Document Register of the Unicode Technical Committee.


William Overington
Tuesday 5 February 2019

Re: Encoding italic

2019-02-05 Thread James Kass via Unicode




William Overington wrote,

> Well, a proposal just about using VS14 to indicate a request for an
> italic version of a glyph in plain text, including a suggestion of to
> which characters it could apply, would test whether such a proposal
> would be accepted to go into the Document Register for the Unicode
> Technical Committee to consider or just be deemed out of scope and
> rejected and not considered by the Unicode Technical Committee.

As long as “italics in plain-text” is considered out-of-scope by 
Unicode, any proposal for handling italics in plain-text would probably 
be considered out-of-scope, as well.  But I could be wrong and wouldn’t 
mind seeing a proposal.

Re: Encoding italic

2019-02-04 Thread James Kass via Unicode




Philippe Verdy responded to William Overington,

> the proposal would contradict the goals of variation selectors and would
> pollute ther variation sequences registry (possibly even creating 
conflicts).
> And if we admit it for italics, than another VSn will be dedicated to 
bold,

> and another for monospace, and finally many would follow for various
> style modifiers.
> Finally we would no longer have enough variation selectors for all 
requests).


There are 256 variation selector characters.  Any use of variation 
sequences not registered by Unicode would be non-conformant.


William’s suggestion of floating a proposal for handling italics with 
VS14 might be an example of the old saying about “putting the cart 
before the horse”.  Any preliminary proposal would first have to clear 
the hurdle of the propriety of handling italic information at the 
plain-text level.  Such a proposal might list various approaches for 
accomplishing that, if that hurdle can be surmounted.

Re: Encoding italic

2019-02-01 Thread Philippe Verdy via Unicode

the proposal would contradict the goals of variation selectors and would
pollute ther variation sequences registry (possibly even creating
conflicts). And if we admit it for italics, than another VSn will be
dedicated to bold, and another for monospace, and finally many would follow
for various style modifiers.
Finally we would no longer have enough variation selectors for all
requests).
And what we would have made was only trying to reproduce another existing
styling standard, but very inefficiently (and this use wil be "abused" for
all usages, creating new implementation constraints and contradicting goals
with existing styling languages: they would then decide to make these
characters incompatible for use in conforming applications. The Unicode
encoding would have lost all its interest.
I do not support the idea of encoding generic styles (applicable to more
than 100k+ existing characters) using variation selectors. Their goal is
only to allow semantic distinctions when two glyphs were unified in one
language may occasionnaly (not always) have some significance in specific
languages. But what you propose would apply to all languages, all scripts,
and would definitely reserve some the the few existing VSn for this styling
use, blocking further registration of needed distinctions (VSn characters
are notably needed for sinographic scripts to properly represent toponyms
or person names, or to solve some problems existing with generic character
properties in Unicode that cannot be changed because of stability rules).


Le jeu. 31 janv. 2019 à 16:32, wjgo_10...@btinternet.com via Unicode <
unicode@unicode.org> a écrit :

> Is the way to try to resolve this for a proposal document to be produced
> for using Variation Selector 14 in order to produce italics and for the
> proposal document to be submitted to the Unicode Technical Committee?
>
> If the proposal is allowed to go to the committee rather than being
> ruled out of scope, then we can know whether the Unicode Technical
> Committee will allow the encoding.
>
> William Overington
>
> Thursday 31 January 2019
>
>

Re: Encoding italic

2019-02-01 Thread James Kass via Unicode




On 2019-01-31 3:18 PM, Adam Borowski via Unicode wrote:

> They're only from a spammer's point of view.

Spammers need love, too.  They’re just not entitled to any.

Re: Encoding italic

2019-01-31 Thread Asmus Freytag via Unicode


  
  
On 1/31/2019 12:55 AM, Tex via Unicode
  wrote:


  As with the many problems with walls not being effective, you choose to ignore the legitimate issues pointed out on the list with the lack of italic standardization for Chinese braille, text to voice readers, etc.
The choice of plain text isn't always voluntary. And the existing alternatives, like math italic characters, are problematic.

The underlying issue is the lack of rich
text support in places where users expect rich text.
The solution is to find ways to enable rich
text layers that are not full documents and make them
interoperable.
The solution is not to push this into plain
text - which then becomes lowest common denominator rich text
instead.
A./

RE: Encoding italic

2019-01-31 Thread Doug Ewell via Unicode

Kent Karlsson wrote:

> ITU T.416/ISO/IEC 8613-6 defines general RGB & CMY(K) colour control
> sequences, which are deferred in ECMA-48/ISO 6429. (The RGB one
> is implemented in Cygwin (sorry for mentioning a product name).)

Fair enough. This thread is mostly about italics and bold and such, not
colors, but the point is well taken that one of these leads invariably
to the others, especially if the standard or flavor in question
implements them.

> ECMA-48/ISO 6429 defines control sequences for CJK emphasising, which
> traditionally does not use bold or italic.

But that's OK. For low-level mechanisms like these, it should be
incumbent on the user to say, "Yes, I can use this styling with that
script, but I shouldn't; it would look terrible and would fly in the
face of convention." ISO 6429 also allows green text on a cyan
background, which is about as good an idea as CJK italics.

> Compare those specified for CSS
> (https://www.w3.org/TR/css-text-decor-3/#propdef-text-decoration-style and
> https://www.w3.org/TR/css-text-decor-3/#propdef-text-emphasis-style).
> These are not at all mentioned in ITU T.416/ISO/IEC 8613-6, but should
> be of interest for the generalised subject of this thread.

I'm hoping we can continue to restrict this thread to plain text.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Encoding italic

2019-01-31 Thread Adam Borowski via Unicode

On Thu, Jan 31, 2019 at 02:21:40PM +, James Kass via Unicode wrote:
> David Starner wrote,
> > The choice of using single-byte character sets isn't always voluntary.
> > That's why we should use ISO-2022, not Unicode. Or we can expect
> > people to fix their systems. What systems are we talking about, that
> > support Unicode but compel you to use plain text? The use of Twitter
> > is surely voluntary.
> 
> This marketing-related web page,
> 
> https://litmus.com/blog/best-practices-for-plain-text-emails-a-look-at-why-theyre-important
> 
> ...lists various reasons for using plain-text e-mail.

They're only from a spammer's point of view.

> Besides marketing, there’s also newsletters and e-mail discussion groups. 
> Some of those discussion groups are probably scholarly. Anyone involved in
> that would likely embrace ‘super cool Unicode text magic’ and it’s
> surprising if none of them have stumbled across the math alphanumerics yet.

Then there are technical mailing lists.  In particular, on every single list
other than Unicode I'm subscribed to, a HTML-only mail would get you flamed
by several list members; even a plain+HTML alternative can get you an
earful.

Then there's LKML and other lists hosted at vger, where a mail that as much
as has a HTML version attached will get outright rejected at mail software
level.

After 2½ decades of participating mailing in mailing lists, I got aversion
to HTML mails burned in as a kind of involuntary reflex.  Upon seeing Asmus'
mails, the ingrained reflex kicks in, I start getting upset, only to realize
what list I'm reading and that it's him who's a regular here, not me.

So even when in principle adding such features would be possible, many
communities decide to prefer interoperability over newest types of bling.
Some prefer top-posted HTML mails, some prefer Twitter, some Unicode plain
text, some perhaps want plain ASCII only.

> It’s true that people don’t have to use Twitter.  People don’t have to turn
> on their computers, either.

And sometimes they use a Braille reader or a text console.

Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄

Re: Encoding italic

2019-01-31 Thread wjgo_10...@btinternet.com via Unicode

Is the way to try to resolve this for a proposal document to be produced 
for using Variation Selector 14 in order to produce italics and for the 
proposal document to be submitted to the Unicode Technical Committee?


If the proposal is allowed to go to the committee rather than being 
ruled out of scope, then we can know whether the Unicode Technical 
Committee will allow the encoding.


William Overington

Thursday 31 January 2019

Re: Encoding italic

2019-01-31 Thread James Kass via Unicode




David Starner wrote,

> The choice of using single-byte character sets isn't always voluntary.
> That's why we should use ISO-2022, not Unicode. Or we can expect
> people to fix their systems. What systems are we talking about, that
> support Unicode but compel you to use plain text? The use of Twitter
> is surely voluntary.

This marketing-related web page,

https://litmus.com/blog/best-practices-for-plain-text-emails-a-look-at-why-theyre-important

...lists various reasons for using plain-text e-mail.  Here’s an excerpt.

“Some people simply prefer it. Plain and simple—some people prefer text 
emails. ... Some users may also see HTML emails as a security and 
privacy risk, and choose not to load any images and have visibility over 
all links that are included in an email. In addition, the increased 
bandwidth that image-heavy emails tend to consume is another driver of 
why users simply prefer plain-text emails.”


Besides marketing, there’s also newsletters and e-mail discussion 
groups.  Some of those discussion groups are probably scholarly. Anyone 
involved in that would likely embrace ‘super cool Unicode text magic’ 
and it’s surprising if none of them have stumbled across the math 
alphanumerics yet.


A web search for the string “plain text only” leads to all manner of 
applications for which searchers are trying to control their 
environments.  There’s all kinds of reasons why some people prefer to 
use plain-text, it’s often an informed choice and it isn’t limited to 
e-mail.


It’s true that people don’t have to use Twitter.  People don’t have to 
turn on their computers, either.

Re: Encoding italic

2019-01-31 Thread James Kass via Unicode




David Starner wrote,

> Emoji, as have been pointed out several times, were in the original
> Unicode standard and date back to the 1980s; the first DOS character
> page has similes at 0x01 and 0x02.

That's disingenuous.

Re: Encoding italic

2019-01-31 Thread David Starner via Unicode

On Thu, Jan 31, 2019 at 12:56 AM Tex  wrote:
>
> David,
>
> "italics has never been considered part of plain text and has always been 
> considered outside of plain text. "
>
> Time to change the definition if that is what is holding you back.

That's not a definition; that's a fact. Again, it's like the 8-bit
byte; there are systems with other sizes of byte, but you usually
shouldn't worry about it. Building systems that don't have 8-bit bytes
are possible, but it's likely to cost more than it's worth.

> As has been said before, interlinear annotation, emoji and other features of 
> Unicode which  are now considered plain text were not in the original 
> definition.

https://www.w3.org/TR/unicode-xml/#Interlinear (which used to be
Unicode Technical Report #20) says "The interlinear annotation
characters were included in Unicode only in order to reserve code
points for very frequent application-internal use. ... Including
interlinear annotation characters in marked-up text does not work
because the additional formatting information (how to position the
annotation,...) is not available. ... The interlinear annotation
characters are also problematic when used in plain text, and are not
intended for that purpose."

Emoji, as have been pointed out several times, were in the original
Unicode standard and date back to the 1980s; the first DOS character
page has similes at 0x01 and 0x02.

> If Unicode encoded an italic mechanism it would be part of plain text, just 
> as the many other styled spaces, dashes and other characters have become 
> plain text despite being typographic.

If Unicode encoded an italic mechanism, then some "plain text" would
include italics. Maybe it would be successful, and maybe it would join
the interlinear annotation characters as another discouraged poorly
supported feature.

> As with the many problems with walls not being effective, you choose to 
> ignore the legitimate issues pointed out on the list with the lack of italic 
> standardization for Chinese braille, text to voice readers, etc.

Text to voice readers don't have problems with the lack of italic
standardization; they have problems with people using mathematical
characters instead of actual letters.

> The choice of plain text isn't always voluntary.

The choice of using single-byte character sets isn't always voluntary.
That's why we should use ISO-2022, not Unicode. Or we can expect
people to fix their systems. What systems are we talking about, that
support Unicode but compel you to use plain text? The use of Twitter
is surely voluntary.

-- 
Kie ekzistas vivo, ekzistas espero.

RE: Encoding italic

2019-01-31 Thread Tex via Unicode

David,

"italics has never been considered part of plain text and has always been 
considered outside of plain text. "

Time to change the definition if that is what is holding you back. As has been 
said before, interlinear annotation, emoji and other features of Unicode which  
are now considered plain text were not in the original definition. If Unicode 
encoded an italic mechanism it would be part of plain text, just as the many 
other styled spaces, dashes and other characters have become plain text despite 
being typographic.

"The fact that italics can be handled elsewhere very much weighs against the 
value of your change. Everything you want to do can be done and is being done, 
except when someone chooses not to do it."

I heard a recent similar argument that goes: walls have been around since 
medieval times and they work really well... (Except they provably don't.)

As with the many problems with walls not being effective, you choose to ignore 
the legitimate issues pointed out on the list with the lack of italic 
standardization for Chinese braille, text to voice readers, etc.
The choice of plain text isn't always voluntary. And the existing alternatives, 
like math italic characters, are problematic.

tex

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of David Starner 
via Unicode
Sent: Wednesday, January 30, 2019 11:59 PM
To: Unicode Mailing List
Subject: Re: Encoding italic

On Wed, Jan 30, 2019 at 11:37 PM James Kass via Unicode
 wrote:
> As Tex Texin observed, differences of opinion as to where we draw the
> line between text and mark-up are somewhat ideological.  If a compelling
> case for handling italics at the plain-text level can be made, then the
> fact that italics can already be handled elsewhere doesn’t matter.  If a
> compelling case cannot be made, there are always alternatives.

To the extent I'd have ideology here, it's that that line is arbitrary
and needs to fit practical demands. Should we have eight-bit bytes?
I'm not sure that was the best solution, and other systems worked just
fine, but we've got a computing environment that makes anything else
unpractical. Unlike that question, italics has never been considered
part of plain text and has always been considered outside of plain
text. The fact that italics can be handled elsewhere very much weighs
against the value of your change. Everything you want to do can be
done and is being done, except when someone chooses not to do it.

-- 
Kie ekzistas vivo, ekzistas espero.

Re: Encoding italic

2019-01-31 Thread Andrew Cunningham via Unicode

On Thursday, 31 January 2019, James Kass via Unicode 
wrote:.
>
>
> As for use of other variant letter forms enabled by the math
> alphanumerics, the situation exists.  It’s an interesting phenomenon which
> is sometimes worthy of comment and relates to this thread because the math
> alphanumerics include italics.  One of the web pages referring to
> third-party input tools calls the practice “super cool Unicode text magic”.
>
>
Although not all devices can render such text. Many Android handsets on the
market do not have a sufficiently recent version of Android to have system
fonts that can render such existing usage.




-- 
Andrew Cunningham
lang.supp...@gmail.com

Re: Encoding italic

2019-01-31 Thread David Starner via Unicode

On Wed, Jan 30, 2019 at 11:37 PM James Kass via Unicode
 wrote:
> As Tex Texin observed, differences of opinion as to where we draw the
> line between text and mark-up are somewhat ideological.  If a compelling
> case for handling italics at the plain-text level can be made, then the
> fact that italics can already be handled elsewhere doesn’t matter.  If a
> compelling case cannot be made, there are always alternatives.

To the extent I'd have ideology here, it's that that line is arbitrary
and needs to fit practical demands. Should we have eight-bit bytes?
I'm not sure that was the best solution, and other systems worked just
fine, but we've got a computing environment that makes anything else
unpractical. Unlike that question, italics has never been considered
part of plain text and has always been considered outside of plain
text. The fact that italics can be handled elsewhere very much weighs
against the value of your change. Everything you want to do can be
done and is being done, except when someone chooses not to do it.

-- 
Kie ekzistas vivo, ekzistas espero.

RE: Encoding italic

2019-01-30 Thread Tex via Unicode

David, Asmus,
 
·   “without external standards, then it's simply impossible.”

·   “And without external standard, not interoperable.“

As you both know there are de jure as well as de facto standards. So for years 
people typed : - ) as a smiley without a de facto standard and at some point 
long before emoji, systems began converting these to smiley faces.

Even the utf-8 BOM began as one company’s non-interoperable convention for 
encoding identifier which later became part of the de facto standard.

Ideally interoperability means supported everywhere but we have many useful 
mechanisms that simply don’t do harm without being interpreted.

For example, Unicode relies on this for backward compatibility when it 
introduces new characters, properties, algorithms, et al that are not 
understood by all systems but are tolerated by older ones.

=

While I am at it, I am amused by the arguments earlier in this thread as well 
as other threads, that go:

·   If the feature was needed developers would have implemented it by now. 
It isn’t implemented so the standard doesn’t need it.

·   The feature was implemented without the standard, so we don’t need it 
in the standard.

If men were meant to fly they would have wings…

Apparently, for some, it is only when there are many conflicting 
implementations that a feature demonstrates both that it is a requirement and 
also that it should be standardized.

In fact, this is sometimes not a bad view as it prevents adding features to the 
standard that go unused yet add complexity. 

But, it can also set too high a bar. And often it isn’t a true criteria but 
just resistance to change.

You  don’t need italics. When I went to school we just tilted the terminal a 
few degrees and voila.

(You don’t need a car. When I went to school we walked 6 miles to get there. 
Uphill both ways. J )

tex

 

 

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Asmus Freytag 
via Unicode
Sent: Wednesday, January 30, 2019 10:20 PM
To: unicode@unicode.org
Subject: Re: Encoding italic

 

On 1/30/2019 7:46 PM, David Starner via Unicode wrote:

On Sun, Jan 27, 2019 at 12:04 PM James Kass via Unicode
 <mailto:unicode@unicode.org>  wrote:

A new beta of BabelPad has been released which enables input, storing,
and display of italics, bold, strikethrough, and underline in plain-text

 
Okay? Ed can do that too, along with nano and notepad. It's called
HTML (TeX, Troff). If by plain-text, you mean self-interpeting,
without external standards, then it's simply impossible.
 

It's either "markdown" or control/tag sequences. Both are out of band 
information.

And without external standard, not interoperable.

A./

Re: Encoding italic

2019-01-30 Thread James Kass via Unicode




David Starner wrote,

>> ... italics, bold, strikethrough, and underline in plain-text
>
> Okay? Ed can do that too, along with nano and notepad. It's called
> HTML (TeX, Troff). If by plain-text, you mean self-interpeting,
> without external standards, then it's simply impossible.

HTML source files are in plain-text.  Hopefully everyone on this list 
understands that and has already explored the marvelous benefits offered 
by granting users the ability to make exciting and effective page 
layouts via any plain-text editor.  HTML is standard and interchangeable.


As Tex Texin observed, differences of opinion as to where we draw the 
line between text and mark-up are somewhat ideological.  If a compelling 
case for handling italics at the plain-text level can be made, then the 
fact that italics can already be handled elsewhere doesn’t matter.  If a 
compelling case cannot be made, there are always alternatives.


As for use of other variant letter forms enabled by the math 
alphanumerics, the situation exists.  It’s an interesting phenomenon 
which is sometimes worthy of comment and relates to this thread because 
the math alphanumerics include italics.  One of the web pages referring 
to third-party input tools calls the practice “super cool Unicode text 
magic”.

Re: Encoding italic

2019-01-30 Thread Asmus Freytag via Unicode


  
  
On 1/30/2019 7:46 PM, David Starner via
  Unicode wrote:


  On Sun, Jan 27, 2019 at 12:04 PM James Kass via Unicode
 wrote:

  
A new beta of BabelPad has been released which enables input, storing,
and display of italics, bold, strikethrough, and underline in plain-text

  
  
Okay? Ed can do that too, along with nano and notepad. It's called
HTML (TeX, Troff). If by plain-text, you mean self-interpeting,
without external standards, then it's simply impossible.



It's either "markdown" or control/tag
sequences. Both are out of band information.
And without external standard, not
interoperable.
A./

Re: Encoding italic

2019-01-30 Thread David Starner via Unicode

On Sun, Jan 27, 2019 at 12:04 PM James Kass via Unicode
 wrote:
> A new beta of BabelPad has been released which enables input, storing,
> and display of italics, bold, strikethrough, and underline in plain-text

Okay? Ed can do that too, along with nano and notepad. It's called
HTML (TeX, Troff). If by plain-text, you mean self-interpeting,
without external standards, then it's simply impossible.

-- 
Kie ekzistas vivo, ekzistas espero.

Re: Encoding italic

2019-01-30 Thread Asmus Freytag via Unicode


  
  
On 1/30/2019 4:38 PM, Kent Karlsson via
  Unicode wrote:


  I did say "multiple" and "for instance". But since you ask:

ITU T.416/ISO/IEC 8613-6 defines general RGB & CMY(K) colour control
sequences, which are deferred in ECMA-48/ISO 6429. (The RGB one
is implemented in Cygwin (sorry for mentioning a product name).)



No need to be sorry; we understand that the motivation is not so
  much advertising as giving a concrete example. It would be
  interesting if anything out there implements CMY(K). My
  expectation would be that this would be limited to interfaces for
  printers or their emulators.




  
(The "named" ones, though very popular in terminal emulators, are
all much too stark, I think, and the exact colour for them are
implementation defined.)



Muted colors are something that's become more popular as display
  hardware has improved. Modern displays are able to reproduce these
  both more predictably as well as with the necessary degree of
  contrast (although some users'/designer's fetish for low contrast
  text design is almost as bad as people randomly mixing "stark"
  FG/BG colors in the '90s.)




  

ECMA-48/ISO 6429 defines control sequences for CJK emphasising, which
traditionally does not use bold or italic. Compare those specified for CSS
(https://www.w3.org/TR/css-text-decor-3/#propdef-text-decoration-style and
https://www.w3.org/TR/css-text-decor-3/#propdef-text-emphasis-style).
These are not at all mentioned in ITU T.416/ISO/IEC 8613-6, but should
be of interest for the generalised subject of this thread.



Mapping all of these to CSS would be essential if you want this
  stuff to be interoperable.




  

There are some other differences as well, but those are the major ones
with regard to text styling. (I don't know those standards to a tee.
I've just looked at the "m" control sequences for text styling. And yes,
I looked at the free copies...)

/Kent Karlsson

PS
If people insist on that EACH character in "plain text" italic/bold/etc
"controls" be default ignorable: one could just take the control sequences
as specified, but map the printable characters part to the corresponding
tag characters... Not that I think that that is really necessary.

Systems that support "markdown", i.e. simplified markup to
  provide the most main-stream features of rich-text tend to do that
  with printable characters, for a reason. Perhaps two reasons.
Users find it preferable to have a visible fallback when
  "markdown" is not interpreted by a receiving system and users'
  generally like the ability to edit the markdown directly (even if,
  for convenience) there's some direct UI support for adding text
  styling.
Loading up the text with lots of invisible characters that may be
  deleted or copied out of order by someone working on a system that
  neither interprets nor displays these code points is an
  interoperability nightmare in my opinion.




  


Den 2019-01-30 22:24, skrev "Doug Ewell via Unicode" :


  
Kent Karlsson wrote:
 


  Yes, great. But as I've said, we've ALREADY got a
default-ignorable-in-display (if implemented right)
way of doing such things.

And not only do we already have one, but it is also
standardised in multiple standards from different
standards institutions. See for instance "ISO/IEC 8613-6,
Information technology --- Open Document Architecture (ODA)
and Interchange Format: Character content architecture".


 
I looked at ITU T.416, which I believe is equivalent to ISO 8613-6 but
has the advantage of not costing me USD 179, and it looks very similar
to ISO 6429 (ECMA-48, formerly ANSI X3.64) with regard to the things we
are talking about: setting text display properties such as bold and
italics by means of escape sequences.
 
Can you explain how ISO 8613-6 differs from ISO 6429 for what we are
doing, and if it does not, why we should not simply refer to the more
familiar 6429?
 
--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Encoding italic

2019-01-30 Thread Kent Karlsson via Unicode

I did say "multiple" and "for instance". But since you ask:

ITU T.416/ISO/IEC 8613-6 defines general RGB & CMY(K) colour control
sequences, which are deferred in ECMA-48/ISO 6429. (The RGB one
is implemented in Cygwin (sorry for mentioning a product name).)
(The "named" ones, though very popular in terminal emulators, are
all much too stark, I think, and the exact colour for them are
implementation defined.)

ECMA-48/ISO 6429 defines control sequences for CJK emphasising, which
traditionally does not use bold or italic. Compare those specified for CSS
(https://www.w3.org/TR/css-text-decor-3/#propdef-text-decoration-style and
https://www.w3.org/TR/css-text-decor-3/#propdef-text-emphasis-style).
These are not at all mentioned in ITU T.416/ISO/IEC 8613-6, but should
be of interest for the generalised subject of this thread.

There are some other differences as well, but those are the major ones
with regard to text styling. (I don't know those standards to a tee.
I've just looked at the "m" control sequences for text styling. And yes,
I looked at the free copies...)

/Kent Karlsson

PS
If people insist on that EACH character in "plain text" italic/bold/etc
"controls" be default ignorable: one could just take the control sequences
as specified, but map the printable characters part to the corresponding
tag characters... Not that I think that that is really necessary.

Den 2019-01-30 22:24, skrev "Doug Ewell via Unicode" :

> Kent Karlsson wrote:
>  
>> Yes, great. But as I've said, we've ALREADY got a
>> default-ignorable-in-display (if implemented right)
>> way of doing such things.
>> 
>> And not only do we already have one, but it is also
>> standardised in multiple standards from different
>> standards institutions. See for instance "ISO/IEC 8613-6,
>> Information technology --- Open Document Architecture (ODA)
>> and Interchange Format: Character content architecture".
>  
> I looked at ITU T.416, which I believe is equivalent to ISO 8613-6 but
> has the advantage of not costing me USD 179, and it looks very similar
> to ISO 6429 (ECMA-48, formerly ANSI X3.64) with regard to the things we
> are talking about: setting text display properties such as bold and
> italics by means of escape sequences.
>  
> Can you explain how ISO 8613-6 differs from ISO 6429 for what we are
> doing, and if it does not, why we should not simply refer to the more
> familiar 6429?
>  
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>

Re: Encoding italic

2019-01-30 Thread Doug Ewell via Unicode

Kent Karlsson wrote:

> Yes, great. But as I've said, we've ALREADY got a
> default-ignorable-in-display (if implemented right)
> way of doing such things.
>
> And not only do we already have one, but it is also
> standardised in multiple standards from different
> standards institutions. See for instance "ISO/IEC 8613-6,
> Information technology --- Open Document Architecture (ODA)
> and Interchange Format: Character content architecture".

I looked at ITU T.416, which I believe is equivalent to ISO 8613-6 but
has the advantage of not costing me USD 179, and it looks very similar
to ISO 6429 (ECMA-48, formerly ANSI X3.64) with regard to the things we
are talking about: setting text display properties such as bold and
italics by means of escape sequences.

Can you explain how ISO 8613-6 differs from ISO 6429 for what we are
doing, and if it does not, why we should not simply refer to the more
familiar 6429?

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Encoding italic

2019-01-30 Thread Doug Ewell via Unicode

Martin J. Dürst wrote:

> Here's a little dirty secret about these tag characters: They were
> placed in one of the astral planes explicitly to make sure they'd use
> 4 bytes per tag character, and thus quite a few bytes for any actual
> complete tags.

Aha. That explains why SCSU had to be banished to the hut, right around
the same time the Plane 14 language tags were deprecated. In SCSU,
astral characters can be 1 byte just like BMP characters.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Encoding italic

2019-01-29 Thread Kent Karlsson via Unicode

Yes, great. But as I've said, we've ALREADY got a
default-ignorable-in-display (if implemented right)
way of doing such things.

And not only do we already have one, but it is also
standardised in multiple standards from different
standards institutions. See for instance "ISO/IEC 8613-6,
Information technology --- Open Document Architecture (ODA)
and Interchange Format: Character content architecture".
(In a little experiment I found that it seems that
Cygwin is one of the better implementations of this;
B.t.w. I have no relation to Cygwin other than using it.)

To boot, it's been around for decades and is still
alive and well. I see absolutely no need for a "bold"
new concept here; the one below is not better in any
significant way.

/Kent Karlsson

Den 2019-01-29 23:35, skrev "Andrew West via Unicode" :

> On Mon, 28 Jan 2019 at 01:55, James Kass via Unicode
>  wrote:
>> 
>> This bold new concept was not mine.  When I tested it
>> here, I was using the tag encoding recommended by the developer.
> 
> Congratulations James, you've successfully interchanged tag-styled
> plain text over the internet with no adverse side effects. I copied
> your email into BabelPad and your "bold" is shown bold (see attached
> screenshot).
> 
> Andrew

Re: Encoding italic

2019-01-29 Thread Andrew West via Unicode

On Mon, 28 Jan 2019 at 01:55, James Kass via Unicode
 wrote:
>
> This bold new concept was not mine.  When I tested it
> here, I was using the tag encoding recommended by the developer.

Congratulations James, you've successfully interchanged tag-styled
plain text over the internet with no adverse side effects. I copied
your email into BabelPad and your "bold" is shown bold (see attached
screenshot).

Andrew

Re: Encoding italic

2019-01-29 Thread James Kass via Unicode




Doug Ewell wrote,

> I can't speak for Andrew, but I strongly suspect he implemented this as
> a proof of concept, not to declare himself the Maker of Standards.

BabelPad also offers plain-text styling via math-alpha conversion, 
although this feature isn’t newly added.  Users interested in seeing how 
plain-text italics might work can try out the stateful approach using 
tags contrasted with the character-by-character approach using 
math-range italic letters.  (Of course, the math-range stuff is already 
being interchanged on the WWW, whilst the tagging method does not yet 
appear to be widely supported.)


A few miles upthread, ‘where are the third-party developers’ was asked.  
‘Everywhere’ is the answer.  Since third-party developers have to 
subsist on the crumbs dropped by the large corps, they tend to be 
responsive to user needs and requests.

Re: Encoding italic

2019-01-29 Thread James Kass via Unicode




On 2019-01-29 5:10 PM, Doug Ewell via Unicode wrote:

I thought we had established that someone had mentioned it on this list,
at some time during the past three weeks. Can someone look up what post
that was? I don't have time to go through scores of messages, and there
is no search facility.

http://www.unicode.org/mail-arch/unicode-ml/y2019-m01/0209.html

Re: Encoding italic

2019-01-29 Thread Andrew West via Unicode

On Tue, 29 Jan 2019 at 10:25, Martin J. Dürst via Unicode
 wrote:
>
> The overall tag proposal had the desired effect: The original proposal
> to hijack some unused bytes in UTF-8 was defeated, and the tags itself
> were not actually used and therefore could be depreciated.

And the tag characters (all except E0001) are now no longer
deprecated. As flag tag sequences are now a thing
(http://www.unicode.org/reports/tr51/#valid-emoji-tag-sequences), and
are widely supported (including on Twitter), your and PV's objections
to using tag characters for a plain text font styling protocol simply
because they are tag characters carry zero weight.

Andrew

Re: Encoding italic

2019-01-29 Thread Doug Ewell via Unicode

Martin J. Dürst wrote:

> Here's a little dirty secret about these tag characters: They were
> placed in one of the astral planes explicitly to make sure they'd use
> 4 bytes per tag character, and thus quite a few bytes for any actual
> complete tags. See https://tools.ietf.org/html/rfc2482 for details.
> Note that RFC 2482 has been obsoleted by
> https://tools.ietf.org/html/rfc6082, in parallel with a similar motion
> on the Unicode side.

I don't recall anyone mentioning Plane 14 language tags per se in this
thread. The tag characters themselves were un-deprecated to support
emoji flag sequences. But more on language tags in a moment.

> These tag characters were born only to shoot down an even worse
> proposal, https://tools.ietf.org/html/draft-ietf-acap-mlsf-01. For
> some additional background, please see
> https://tools.ietf.org/html/draft-ietf-acap-langtag-00.
>
> The overall tag proposal had the desired effect: The original proposal
> to hijack some unused bytes in UTF-8 was defeated, and the tags itself
> were not actually used and therefore could be depreciated.

I agree that the ACAP proposal was awful, for many reasons and on many
levels. But in general, introducing a new standardized mechanism SO THAT
it can be deprecated is a crummy idea. It engenders bad feelings and
distrust among loyal users of the standard. Major software vendors, one
in particular starting with M, have been castigated for decades for
employing tactics similar to this.

> Bad ideas turn up once every 10 or 20 years. It usually takes some
> time for some of the people to realize that they are bad ideas. But
> that doesn't make them any better when they turn up again.

The suggestions over the past three weeks to encode basic styling in
plain text (I'm not saying I'm for or against that) have some
similarities with Plane 14 language tags: many people consider both
types of information to be meta-information, unsuitable for plain text,
and many of the suggested mechanisms are stateful, which is an anti-goal
of Unicode. But these are NOT the same idea, and the fact that they both
use Plane 14 tag characters doesn't make them so.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Encoding italic

2019-01-29 Thread Doug Ewell via Unicode

Kent Karlsson wrote:

> We already have a well-established standard for doing this kind of
> things...

I thought we were having this discussion because none of the existing
methods, no matter how well documented, has been accepted on a
widespread basis as "the" standard.

Some people dislike markdown because it looks like lightweight markup
(which it is), not like actual italics and boldface. Some dislike ISO
6429 because escape characters are invisible and might interfere with
other protocols (though they really shouldn't). Some dislike math
alphanumerics abuse because it's abuse, doesn't cover other writing
systems, etc.

I'd be happy to work with Kent to campaign for ISO 6429 as "the"
well-established standard for applying simple styling to plain text, but
we would have to acknowledge the significant challenges.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Encoding italic

2019-01-29 Thread Doug Ewell via Unicode

Philippe Verdy replied to James Kass:
 
> You're not very explicit about the Tag encoding you use for these
> styles.
 
Of course, it was Andrew West who implemented the styling mechanism in a
beta release of BabelPad. James was just reporting on it.
 
> And what is then the interest compared to standard HTML
 
This entire discussion, for more than three weeks now, has been about
how to implement styling (e.g. italics) in plain text. Everyone knows it
can be done, and how to do it, in rich text.
 
> So you used "bold  U+E003E> I.e, you converted from ASCII to tag characters the full HTML
> sequences "" and "", including the HTML element name. I see
> little interest for that approach.
 
I thought we had established that someone had mentioned it on this list,
at some time during the past three weeks. Can someone look up what post
that was? I don't have time to go through scores of messages, and there
is no search facility.
 
I can't speak for Andrew, but I strongly suspect he implemented this as
a proof of concept, not to declare himself the Maker of Standards.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Encoding italic

2019-01-29 Thread Martin J . Dürst via Unicode

On 2019/01/28 05:03, James Kass via Unicode wrote:
> 
> A new beta of BabelPad has been released which enables input, storing, 
> and display of italics, bold, strikethrough, and underline in plain-text 
> using the tag characters method described earlier in this thread.  This 
> enhancement is described in the release notes linked on this download page:
> 
> http://www.babelstone.co.uk/Software/index.html
>

I didn't say anything at the time this idea first came up, because I 
hoped people would understand that it was a bad idea.

Here's a little dirty secret about these tag characters: They were 
placed in one of the astral planes explicitly to make sure they'd use 4 
bytes per tag character, and thus quite a few bytes for any actual 
complete tags. See https://tools.ietf.org/html/rfc2482 for details. Note 
that RFC 2482 has been obsoleted by https://tools.ietf.org/html/rfc6082, 
in parallel with a similar motion on the Unicode side.

These tag characters were born only to shoot down an even worse 
proposal, https://tools.ietf.org/html/draft-ietf-acap-mlsf-01. For some 
additional background, please see 
https://tools.ietf.org/html/draft-ietf-acap-langtag-00.

The overall tag proposal had the desired effect: The original proposal 
to hijack some unused bytes in UTF-8 was defeated, and the tags itself 
were not actually used and therefore could be depreciated.

Bad ideas turn up once every 10 or 20 years. It usually takes some time 
for some of the people to realize that they are bad ideas. But that 
doesn't make them any better when they turn up again.

Regards,   Martin.

Re: Encoding italic

2019-01-29 Thread Martin J . Dürst via Unicode

On 2019/01/24 23:49, Andrew West via Unicode wrote:
> On Thu, 24 Jan 2019 at 13:59, James Kass via Unicode
>  wrote:

> We were told time and time again when emoji were first proposed that
> they were required for encoding for interoperability with Japanese
> telecoms whose usage had spilled over to the internet. At that time
> there was no suggestion that encoding emoji was anything other than a
> one-off solution to a specific problem with PUA usage by different
> vendors, and I at least had no idea that emoji encoding would become a
> constant stream with an annual quota of 60+ fast-tracked
> user-suggested novelties. Maybe that was the hidden agenda, and I was
> just naïve.

I don't think this was a hidden agenda. Nobody in the US or Europe 
thought that emoji would catch on like they did, with ordinary people 
and the press. Of course they had been popular in Japan, that's why the 
got into Unicode.

> The ESC and UTC do an appallingly bad job at regulating emoji, and I
> would like to see the Emoji Subcommittee disbanded, and decisions on
> new emoji taken away from the UTC, and handed over to a consortium or
> committee of vendors who would be given a dedicated vendor-use emoji
> plane to play with (kinda like a PUA plane with pre-assigned
> characters with algorithmic names [VENDOR-ASSIGNED EMOJI X] which
> the vendors can then associate with glyphs as they see fit; and as
> emoji seem to evolve over time they would be free to modify and
> reassign glyphs as they like because the Unicode Standard would not
> define the meaning or glyph for any characters in this plane).

To a small extent, that already happens. The example I'm thinking about 
is the transition from a (potentially bullet-carrying) pistol to a 
waterpistol. The Unicode consortium doesn't define the meaning of any of 
it's characters, and doesn't define stardard glyphs for characters, just 
example glyphs. Another example is a presenter at a conference who was 
using lots of emoji saying that he will need to redo his presentation 
because the vendor of his notebook's OS was in the process of changing 
their emoji designs.

Regards,Martin.

Re: Encoding italic

2019-01-28 Thread Phake Nick via Unicode

2019-1-25 13:46, Garth Wallace via Unicode  wrote:

>
> On Wed, Jan 23, 2019 at 1:27 AM James Kass via Unicode <
> unicode@unicode.org> wrote:
>
>>
>> Nobody has really addressed Andrew West's suggestion about using the tag
>> characters.
>>
>> It seems conformant, unobtrusive, requiring no official sanction, and
>> could be supported by third-partiers in the absence of corporate
>> interest if deemed desirable.
>>
>> One argument against it might be:  Whoa, that's just HTML.  Why not just
>> use HTML?  SMH
>>
>> One argument for it might be:  Whoa, that's just HTML!  Most everybody
>> already knows about HTML, so a simple subset of HTML would be
>> recognizable.
>>
>> After revisiting the concept, it does seem elegant and workable. It
>> would provide support for elements of writing in plain-text for anyone
>> desiring it, enabling essential (or frivolous) preservation of
>> editorial/authorial intentions in plain-text.
>>
>> Am I missing something?  (Please be kind if replying.)
>>
>
> There is also RFC 1896 "enriched text", which is an attempt at a
> lightweight HTML substitute for styling in email. But these, and the ANSI
> escape code suggestion, seem like they're trying to solve the wrong problem
> here.
>
> Here's how I understand the situation:
> * Some people using forms of text or mostly-text communication that do not
> provide styling features want to use styling, for emphasis or personal flair
> * Some of these people caught on to the existence of the "styled"
> mathematical alphanumerics and, not caring that this is "wrong", started
> using them as a workaround
> * The use of these symbols, which are not technically equivalent to basic
> Latin, make posts inaccessible to screen readers, among other problems
>
> These are suggestions for Unicode to provide a different, more
> "acceptable" workaround for a lack of functionality in these social media
> systems (this mostly seems to be an issue with Twitter; IME this shows up
> much less on Facebook). But the root problem isn't the kludge, it's the
> lack of functionality in these systems: if Twitter etc. simply implemented
> some styling on their own, the whole thing would be a moot point.
> Essentially, this is trying to add features to Twitter without waiting for
> their development team.
>
> Interoperability is not an issue, since in modern computers copying and
> pasting styled text between apps works just fine.
>

How about outside social media system? For example, Chinese Braille have
symbols that indicate the start and end position of proper name mark and
book name mark punctuation, however when converted to plain text they
cannot be displayed with Unicode text because of the mindset that it should
be the task of styling software to render this punctuation, just because
the two punctuations are basically straight underline and wavy underline
beneath text in normal Chinese text.

>

Re: Encoding italic

2019-01-28 Thread Phake Nick via Unicode

Gmail can do *Märchen* although I am not too sure about how they transmit
such formatting and not sure about how interoperatable are they.

在 2019年1月22日週二 14:43，Adam Borowski via Unicode  寫道：

> On Mon, Jan 21, 2019 at 12:29:42AM -0800, David Starner via Unicode wrote:
> > On Sun, Jan 20, 2019 at 11:53 PM James Kass via Unicode
> >  wrote:
> > >  Even though /we/ know how to do
> > > it and have software installed to help us do it.
> >
> > You're emailing from Gmail, which has support for italics in email.
>
> ... and how exactly can they send italics in an e-mail?  All they can do is
> to bundle a web page as an attachment, which some clients display instead
> of
> the main text.
>
> The e-mail's body text supports anything Unicode does, including
> 푖푡푎푙푖푐 and
> even ̏̋̃ ̉̀̋̉̂̕, but, remarkably, not italic umlauted characters,
> thai nor
> han.
>

Re: Encoding italic

2019-01-28 Thread Kent Karlsson via Unicode



Den 2019-01-28 02:53, skrev "James Kass via Unicode" :

> plain-text and are uncomfortable using the math alphanumerics for this,
> although the math alphanumerics seem well qualified for the purpose. 

It "works" basically only for English (note that any diacritics would be
placed suitable for math, not for words, and then there are Latin letters
that do not have a decomposition (like ø), and then there is of course
Cyrillic, and a whole slew of non-Latin scripts. So, no, they do NOT AT
ALL "seem well qualified". And... We already have a well-established
standard for doing this kind of things...

/Kent K

Re: Encoding italic

2019-01-28 Thread Philippe Verdy via Unicode

So you used
"bold 
I.e, you converted from ASCII to tag characters the full HTML sequences
"" and "", including the HTML element name. I see little interest
for that approach.

Additionally this means that U+E003C is the tag identifier and its scope
does not end for the rest of the text (the HTML close tag is closing the
previous Unicode tag but opens a new one, as the second sequence is not
, i.e. the Unicode tag-cancel).

I bet that a Unicode confirming code that treats some tag characters could
choose to remove everything in a Unicode tag that it does not understand
(e.g. U+E003C is not an understood identifier, only U+E0001 is understood
as a language tag) or does not want to parse but without the tag-cancel,
all the rest of your email could have been truncated, instead of just the
tagged text "bold".

Given how HTML tags are nesting(.. or not...), I don't think this approach
is desirable

And I'm not sure that everyone on this list actually received you mail with
this tag, it may have happened that your mail was truncated or all U+E00nn
characters were silently removed by an intermediate agent not wanting to
support any Unicode Tag character.

Le lun. 28 janv. 2019 à 03:03, James Kass via Unicode 
a écrit :

>
> On 2019-01-27 11:44 PM, Philippe Verdy wrote:
>
>  > You're not very explicit about the Tag encoding you use for these
> styles.
>
> This bold new concept was not mine.  When I tested it
> here, I was using the tag encoding recommended by the developer.
>
>  > Of course it must not be a language tag so the introducer is not
> U+E0001, or a cancel-all tag so it
>  > is not prefixed by U+E007F   It cannot also use letter-like,
> digit-like and hyphen-like tag characters
>  > for its introduction.  So probably you use some prefix in
> U+E0002..U+E001F and some additional tag
>  > (tag "I" for italic, tag "B" for bold, tag "U" for underline, tag "S"
> for strikethough?) and the cancel
>  > tag to return to normal text (terminate the tagged sequence).
>
> Yes, U+E0001 remains deprecated and its use is strongly discouraged.
>
>  > Or may be you just use standard HTML encoding by adding U+E to
> each character of the HTML
>  > tag syntax (including attributes and close tags, allowing embedding?)
> So you use the "<" and ">" tag
>  > characters (possibly also the space tag U+E0020, or TAB tag U+E0009
> for separating attributes and the
>  > quotation tags for attribute values)?  Is your proposal also allowing
> the embedding of other HTML
>  > objects (such as SVG)?
>
> AFAICT, this beta release supports the tag sequences , ,
> , &  expressed here in ASCII.  I don’t know if the
> software developer has plans to expand the enhancements in the future.
>
>  > And what is then the interest compared to standard HTML (it is not
> more compact, ...
>
> This was one of the ideas which surfaced earlier in this thread. Some
> users have expressed an interest in preserving, for example, italics in
> plain-text and are uncomfortable using the math alphanumerics for this,
> although the math alphanumerics seem well qualified for the purpose.
> One of the advantages given for this approach earlier is that it can be
> made to work without any official sanction and with no action necessary
> by the Consortium.
>
>  > I bet in fact that all tag characters are most often restricted in
> text input forms, and will be
>  > silently discarded or the whole text will be rejected.
>
> In this e-mail, I used the tags  &  around the word “bold” in the
> first sentence of my reply in order to test your bet.
>
>  > We were told that these tag characters were deprecated, and in fact
> even their use for language
>  > tags has not found any significant use except some trials (but there
> are now better technologies
>  > available in lot of softwares, APIs and services, and application
> design/development tools, or
>  > document editing/publishing tools).
>
> Indeed, these tags were deprecated.  At the time the tags were
> deprecated, there was such sorrow on this list that some list members
> were even inspired to compose haiku lamenting their passing and did post
> those haiku to this list.  Now, thanks to emoji requirements, many of
> those tags are experiencing a resurrection/renaissance.  I wonder if
> anyone is composing limericks in joyful celebration…
>
>

Re: Encoding italic

2019-01-27 Thread James Kass via Unicode

On 2019-01-27 11:44 PM, Philippe Verdy wrote:

> You're not very explicit about the Tag encoding you use for these styles.

This bold new concept was not mine.  When I tested it 
here, I was using the tag encoding recommended by the developer.

> Of course it must not be a language tag so the introducer is not 
U+E0001, or a cancel-all tag so it
> is not prefixed by U+E007F   It cannot also use letter-like, 
digit-like and hyphen-like tag characters
> for its introduction.  So probably you use some prefix in 
U+E0002..U+E001F and some additional tag
> (tag "I" for italic, tag "B" for bold, tag "U" for underline, tag "S" 
for strikethough?) and the cancel

> tag to return to normal text (terminate the tagged sequence).

Yes, U+E0001 remains deprecated and its use is strongly discouraged.

> Or may be you just use standard HTML encoding by adding U+E to 
each character of the HTML
> tag syntax (including attributes and close tags, allowing embedding?) 
So you use the "<" and ">" tag
> characters (possibly also the space tag U+E0020, or TAB tag U+E0009 
for separating attributes and the
> quotation tags for attribute values)?  Is your proposal also allowing 
the embedding of other HTML

> objects (such as SVG)?

AFAICT, this beta release supports the tag sequences , , 
, &  expressed here in ASCII.  I don’t know if the 
software developer has plans to expand the enhancements in the future.

> And what is then the interest compared to standard HTML (it is not 
more compact, ...

This was one of the ideas which surfaced earlier in this thread. Some 
users have expressed an interest in preserving, for example, italics in 
plain-text and are uncomfortable using the math alphanumerics for this, 
although the math alphanumerics seem well qualified for the purpose.  
One of the advantages given for this approach earlier is that it can be 
made to work without any official sanction and with no action necessary 
by the Consortium.

> I bet in fact that all tag characters are most often restricted in 
text input forms, and will be

> silently discarded or the whole text will be rejected.

In this e-mail, I used the tags  &  around the word “bold” in the 
first sentence of my reply in order to test your bet.

> We were told that these tag characters were deprecated, and in fact 
even their use for language
> tags has not found any significant use except some trials (but there 
are now better technologies
> available in lot of softwares, APIs and services, and application 
design/development tools, or

> document editing/publishing tools).

Indeed, these tags were deprecated.  At the time the tags were 
deprecated, there was such sorrow on this list that some list members 
were even inspired to compose haiku lamenting their passing and did post 
those haiku to this list.  Now, thanks to emoji requirements, many of 
those tags are experiencing a resurrection/renaissance.  I wonder if 
anyone is composing limericks in joyful celebration…

Re: Encoding italic

2019-01-27 Thread Kent Karlsson via Unicode

Apart from that control sequences for (some) styling is standardised
(since decades by now), and the "tag characters" approach is not:

For the control sequences for styling, there is no pretence of nesting,
just setting/unsetting an aspect of styling. For  etc. (in tag
characters) there is at least the pretence/appearance of nesting, even
if the interpreter doesn't actually care about nesting (and just interprets
them as set/unset). (In addition,  etc. in "real" HTML are
1) disrecommended, and
2) the actual styling comes from a style sheet (and the **default**
one makes  stuff bold).)

/Kent K


Den 2019-01-27 21:03, skrev "James Kass via Unicode" :

> 
> A new beta of BabelPad has been released which enables input, storing,
> and display of italics, bold, strikethrough, and underline in plain-text
> using the tag characters method described earlier in this thread.  This
> enhancement is described in the release notes linked on this download page:
> 
> http://www.babelstone.co.uk/Software/index.html
>

Re: Encoding italic

2019-01-27 Thread Philippe Verdy via Unicode

You're not very explicit about the Tag encoding you use for these styles.

Of course it must not be a language tag so the introducer is not U+E0001,
or a cancel-all tag so it is not prefixed by U+E007F
It cannot also use letter-like, digit-like and hyphen-like tag characters
for its introduction.
So probably you use some prefix in U+E0002..U+E001F and some additional tag
(tag "I" for italic, tag "B" for bold, tag "U" for underline, tag "S" for
strikethough?) and the cancel tag to return to normal text (terminate the
tagged sequence).

Or may be you just use standard HTML encoding by adding U+E to each
character of the HTML tag syntax (including attributes and close tags,
allowing embedding?) So you use the "<" and ">" tag characters (possibly
also the space tag U+E0020, or TAB tag U+E0009 for separating attributes
and the quotation tags for attribute values)?
Is your proposal also allowing the embedding of other HTML objects (such as
SVG)?

In that case what you do is only to remap the HTML syntax outside the
standard text. If an attribute values contains standard text (such as ...) do you also remap the attribute value, i.e.
"Some text"? Do you remap the technical name of the HTML tag itself i.e.
"span" in the last example?

And what is then the interest compared to standard HTML (it is not more
compact, and just adds another layer on top of it), except allowing to
embed it in places where plain HTML would be restricted by form inputs or
would be reconverted using character entities hiding the effect of "<", ">"
and "&" in HTML so they are not reinterpreted as HTML but as plain-text
characters?

Now let's suppose that your convention starts being decoded and used in
some applications, this could be used to transport sensitive active scripts
(e.g. Javascript event handlers or plain

Re: Encoding italic

2019-01-27 Thread James Kass via Unicode




A new beta of BabelPad has been released which enables input, storing, 
and display of italics, bold, strikethrough, and underline in plain-text 
using the tag characters method described earlier in this thread.  This 
enhancement is described in the release notes linked on this download page:


http://www.babelstone.co.uk/Software/index.html

Re: Encoding italic

2019-01-25 Thread James Kass via Unicode




On 2019-01-26 12:18 AM, Asmus Freytag (c) responded:

On 1/25/2019 3:49 PM, Andrew Cunningham wrote:
Assuming some mechanism for italics is added to Unicode,  when 
converting between the new plain text and HTML there is insufficient 
information to correctly convert to HTML. many elements may have 
italic stying and there would be no meta information in Unicode to 
indicate the appropriate HTML element.




So, we would be creating an interoperability issue.



What happens now when we convert plain-text to HTML?

Re: Encoding italic

2019-01-25 Thread Asmus Freytag (c) via Unicode


On 1/25/2019 3:49 PM, Andrew Cunningham wrote:
Assuming some mechanism for italics is added to Unicode,  when 
converting between the new plain text and HTML there is insufficient 
information to correctly convert to HTML. many elements may have 
italic stying and there would be no meta information in Unicode to 
indicate the appropriate HTML element.




So, we would be creating an interoperability issue.

A./





On Friday, 25 January 2019, wjgo_10...@btinternet.com 
 via Unicode > wrote:


Asmus Freytag wrote;

Other schemes, like a VS per code point, also suffer from
being different in philosophy from "standard" rich text
approaches. Best would be as standard extension to all the
messaging systems (e.g. a common markdown language, supported
by UI).     A./


Yet that claim of what would be best would be stateful and
statefulness is the very thing that Unicode seeks to avoid.

Plain text is the basic system and a Variation Selector mechanism
after each character that is to become italicized is not stateful
and can be implemented using existing OpenType technology.

If an organization chooses to develop and use a rich text format
then that is a matter for that organization and any changing of
formatting of how italics are done when converting between plain
text and rich text is the responsibility of the organization that
introduces its rich text format.

Twitter was just an example that someone introduced along the way,
it was not the original request.

Also this is not only about messaging. Of primary importance is
the conservation of texts in plain text format, for example, where
a printed book has one word italicized in a sentence and the text
is being transcribed into a computer.

William Overington
Friday 25 January 2019



--
Andrew Cunningham
lang.supp...@gmail.com

Re: Encoding italic

2019-01-25 Thread Andrew Cunningham via Unicode

Assuming some mechanism for italics is added to Unicode,  when converting
between the new plain text and HTML there is insufficient information to
correctly convert to HTML. many elements may have italic stying and there
would be no meta information in Unicode to indicate the appropriate HTML
element.




On Friday, 25 January 2019, wjgo_10...@btinternet.com via Unicode <
unicode@unicode.org> wrote:

> Asmus Freytag wrote;
>
> Other schemes, like a VS per code point, also suffer from being different
>> in philosophy from "standard" rich text approaches. Best would be as
>> standard extension to all the messaging systems (e.g. a common markdown
>> language, supported by UI). A./
>>
>
> Yet that claim of what would be best would be stateful and statefulness is
> the very thing that Unicode seeks to avoid.
>
> Plain text is the basic system and a Variation Selector mechanism after
> each character that is to become italicized is not stateful and can be
> implemented using existing OpenType technology.
>
> If an organization chooses to develop and use a rich text format then that
> is a matter for that organization and any changing of formatting of how
> italics are done when converting between plain text and rich text is the
> responsibility of the organization that introduces its rich text format.
>
> Twitter was just an example that someone introduced along the way, it was
> not the original request.
>
> Also this is not only about messaging. Of primary importance is the
> conservation of texts in plain text format, for example, where a printed
> book has one word italicized in a sentence and the text is being
> transcribed into a computer.
>
> William Overington
> Friday 25 January 2019
>
>

-- 
Andrew Cunningham
lang.supp...@gmail.com

Re: Encoding italic

2019-01-25 Thread Asmus Freytag (c) via Unicode


On 1/25/2019 1:06 AM, wjgo_10...@btinternet.com wrote:

Asmus Freytag wrote;

Other schemes, like a VS per code point, also suffer from being 
different in philosophy from "standard" rich text approaches. Best 
would be as standard extension to all the messaging systems (e.g. a 
common markdown language, supported by UI). A./


Yet that claim of what would be best would be stateful and 
statefulness is the very thing that Unicode seeks to avoid. 


All rich text is stateful, and rich text is very widely used and 
cut tends to work rather well among applications that support it, 
as do conversions of entire documents. Trying to duplicate it with "yet 
another mechanism" is a doubtful achievement, even if it could be made 
"stateless".


A./

Re: Encoding italic

2019-01-25 Thread wjgo_10...@btinternet.com via Unicode


Asmus Freytag wrote;

Other schemes, like a VS per code point, also suffer from being 
different in philosophy from "standard" rich text approaches. Best 
would be as standard extension to all the messaging systems (e.g. a 
common markdown language, supported by UI). A./


Yet that claim of what would be best would be stateful and statefulness 
is the very thing that Unicode seeks to avoid.


Plain text is the basic system and a Variation Selector mechanism after 
each character that is to become italicized is not stateful and can be 
implemented using existing OpenType technology.


If an organization chooses to develop and use a rich text format then 
that is a matter for that organization and any changing of formatting of 
how italics are done when converting between plain text and rich text is 
the responsibility of the organization that introduces its rich text 
format.


Twitter was just an example that someone introduced along the way, it 
was not the original request.


Also this is not only about messaging. Of primary importance is the 
conservation of texts in plain text format, for example, where a printed 
book has one word italicized in a sentence and the text is being 
transcribed into a computer.


William Overington
Friday 25 January 2019

Re: Encoding italic

2019-01-25 Thread David Starner via Unicode

On Thu, Jan 24, 2019 at 11:16 PM Tex via Unicode  wrote:
> Twitter was offered as an example, not the only example just one of the most 
> ubiquitous. Many messaging apps and other apps would benefit from italics. 
> The argument is not based on adding italics to twitter.

And again, color me skeptical. If italics are just added to Unicode
and not to the relevant app or interface, they will not see much use,
in the same way that most non-ASCII characters for proper English--the
quotes, the dashes, the accents--are often ignored because they're too
hard to enter. But if you're going to add italics, having it in
Unicode doesn't make it significantly easier, particularly when they
need to support systems that predate Unicode adding italics.

> The biggest burden would be to the apps that would benefit, to add 
> italicizing and editing capabilities.

If they would benefit or if they'd accept the burden, they'd have
already added italics, via HTML or Markdown or escape sequences or
whatever.

-- 
Kie ekzistas vivo, ekzistas espero.

Re: Encoding italic

2019-01-24 Thread Asmus Freytag (c) via Unicode


On 1/24/2019 11:14 PM, Tex wrote:


I am surprised at the length of this debate, especially since the 
arguments are repetitive…


That said:

Twitter was offered as an example, not the only example just one of 
the most ubiquitous. Many messaging apps and other apps would benefit 
from italics. The argument is not based on adding italics to twitter.


Most apps today have security protections that filter or translate 
problematic characters. If the proposal would cause “normalization” 
problems, adding the proposed characters to the filter lists or 
substitution lists would not be a big burden.


The biggest burden would be to the apps that would benefit, to add 
italicizing and editing capabilities.


The "normalization" is when you import to rich text, you don't want 
competing formatting instructions. Getting styled character codes 
normalized to styling of character runs is the most difficult, that's 
why the abuse of math italics really is abuse in terms of interoperability.


Other schemes, like a VS per code point, also suffer from being 
different in philosophy from "standard" rich text approaches. Best would 
be as standard extension to all the messaging systems (e.g. a common 
markdown language, supported by UI).


A./


tex

*From:*Unicode [mailto:unicode-boun...@unicode.org] *On Behalf Of 
*Asmus Freytag via Unicode

*Sent:* Thursday, January 24, 2019 10:34 PM
*To:* unicode@unicode.org
*Subject:* Re: Encoding italic

On 1/24/2019 9:44 PM, Garth Wallace via Unicode wrote:

But the root problem isn't the kludge, it's the lack of
functionality in these systems: if Twitter etc. simply implemented
some styling on their own, the whole thing would be a moot point.
Essentially, this is trying to add features to Twitter without
waiting for their development team.

Interoperability is not an issue, since in modern computers
copying and pasting styled text between apps works just fine.

Yep, that's what this is: trying to add features to some platforms 
that could very simply be added by the  respective developers while in 
the process causing a normalization issue (of sorts) everywhere else.


A./

RE: Encoding italic

2019-01-24 Thread Tex via Unicode

I am surprised at the length of this debate, especially since the arguments are 
repetitive…

 

That said:

 

Twitter was offered as an example, not the only example just one of the most 
ubiquitous. Many messaging apps and other apps would benefit from italics. The 
argument is not based on adding italics to twitter.

 

Most apps today have security protections that filter or translate problematic 
characters. If the proposal would cause “normalization” problems, adding the 
proposed characters to the filter lists or substitution lists would not be a 
big burden.

The biggest burden would be to the apps that would benefit, to add italicizing 
and editing capabilities.

 

tex

 

 

 

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Asmus Freytag 
via Unicode
Sent: Thursday, January 24, 2019 10:34 PM
To: unicode@unicode.org
Subject: Re: Encoding italic

 

On 1/24/2019 9:44 PM, Garth Wallace via Unicode wrote:

But the root problem isn't the kludge, it's the lack of functionality in these 
systems: if Twitter etc. simply implemented some styling on their own, the 
whole thing would be a moot point. Essentially, this is trying to add features 
to Twitter without waiting for their development team.

Interoperability is not an issue, since in modern computers copying and pasting 
styled text between apps works just fine.  

Yep, that's what this is: trying to add features to some platforms that could 
very simply be added by the  respective developers while in the process causing 
a normalization issue (of sorts) everywhere else. 

A./

Re: Encoding italic

2019-01-24 Thread Asmus Freytag via Unicode


  
  
On 1/24/2019 9:44 PM, Garth Wallace via
  Unicode wrote:


  But the root problem isn't the kludge, it's the lack of
functionality in these systems: if Twitter etc. simply
implemented some styling on their own, the whole thing would be
a moot point. Essentially, this is trying to add features to
Twitter without waiting for their development team.

  
  Interoperability is not an issue, since in modern computers
copying and pasting styled text between apps works just fine.  

Yep, that's what this is: trying to add
features to some platforms that could very simply be added by
the  respective developers while in the process causing a
normalization issue (of sorts) everywhere else. 
  
A./

Re: Encoding italic

2019-01-24 Thread Garth Wallace via Unicode

On Wed, Jan 23, 2019 at 1:27 AM James Kass via Unicode 
wrote:

>
> Nobody has really addressed Andrew West's suggestion about using the tag
> characters.
>
> It seems conformant, unobtrusive, requiring no official sanction, and
> could be supported by third-partiers in the absence of corporate
> interest if deemed desirable.
>
> One argument against it might be:  Whoa, that's just HTML.  Why not just
> use HTML?  SMH
>
> One argument for it might be:  Whoa, that's just HTML!  Most everybody
> already knows about HTML, so a simple subset of HTML would be recognizable.
>
> After revisiting the concept, it does seem elegant and workable. It
> would provide support for elements of writing in plain-text for anyone
> desiring it, enabling essential (or frivolous) preservation of
> editorial/authorial intentions in plain-text.
>
> Am I missing something?  (Please be kind if replying.)
>

There is also RFC 1896 "enriched text", which is an attempt at a
lightweight HTML substitute for styling in email. But these, and the ANSI
escape code suggestion, seem like they're trying to solve the wrong problem
here.

Here's how I understand the situation:
* Some people using forms of text or mostly-text communication that do not
provide styling features want to use styling, for emphasis or personal flair
* Some of these people caught on to the existence of the "styled"
mathematical alphanumerics and, not caring that this is "wrong", started
using them as a workaround
* The use of these symbols, which are not technically equivalent to basic
Latin, make posts inaccessible to screen readers, among other problems

These are suggestions for Unicode to provide a different, more "acceptable"
workaround for a lack of functionality in these social media systems (this
mostly seems to be an issue with Twitter; IME this shows up much less on
Facebook). But the root problem isn't the kludge, it's the lack of
functionality in these systems: if Twitter etc. simply implemented some
styling on their own, the whole thing would be a moot point. Essentially,
this is trying to add features to Twitter without waiting for their
development team.

Interoperability is not an issue, since in modern computers copying and
pasting styled text between apps works just fine.

Re: Encoding italic (was: A last missing link)

2019-01-24 Thread Kent Karlsson via Unicode

Den 2019-01-24 03:21, skrev "Mark E. Shoulson via Unicode"
:

> On 1/22/19 6:26 PM, Kent Karlsson via Unicode wrote:
>> Ok. One thing to note is that escape sequences (including control sequences,
>> for those who care to distinguish those) probably should be "default
>> ignorable" for display. Requiring, or even recommending, them to be default
>> ignorable for other processing (like sorting, searching, and other things)
>> may be a tall order. So, for display, (maximal) substrings that match:
>> 
>> \u001B[\u0020-\002F]*[\u0030-\007E]|
>> (\u001B'['|\009B)[\u0030-\003F]*[\u0020-\002F]*[\u0040-\007E]
>> 
>> should be default ignorable (i.e. invisible, but a "show invisibles" mode
>> would show them; not interpreted ones should be kept, even if interpreted
>> ones need not, just (re)generated on save). That is as far as Unicode
>> should go.
> 
> So it isn't just "these characters should be default ignorable", but
> "this regular expression is default ignorable."  This gets back to
> "things that span more than a character" again, only this time the
> "span" isn't the text being styled, it's the annotation to style it. 

True. That is how ECMA/ISO/ANSI escape/control-sequences are designed.
Had they not already been designed, and implemented, but we were to do
a design today, it would surely be done differently; e.g. having
"controls" that consisted only of (individually) "default-ignorable"
characters.

But, and this is the important thing here:

a) The current esc/control-sequences is an accepted standard,
since long.

b) This standard is still in very much active use, albeit mostly
by terminal emulators. But the styling stuff need not at all
be limited to terminal emulators.

Since it is an actively and widely used standard, I don't see the
point of trying to design another way of specifying "default
ignorable"-controls for text styling. (HTML, for instance, does not
have "default ignorable" controls, since ALL characters in the
"controls" are printable characters, so one needs a "second level"
for parsing the controls.) True, ignoring or interpreting an
esc/control-sequence requires some processing of substrings, since
some (all but the first) are printable characters. But not that hard.
It has been implemented over and over...

Had this standard been defunct, then there would be an opportunity
to design something different.

> The "bash" shell has special escape-sequences (\[ and \]) to use in
> defining its prompt that tell the system that the text enclosed by them
> is not rendered and should not be counted when it comes to doing

Never heard of. Cannot find any reference mentioning them. Reference?

> cursor-control and line-editing stuff (so you put them around, yep, the
> escape sequences for coloring or boldfacing or whatever that you want in
> your prompt). 

Line editing stuff in bash is done on an internal buffer (there is a library
for doing this, and that library can be used by various other command line
programs; bash does not use the system input line editing). Then that
library tries to show what is in the buffer on the terminal. So, I'm
not sure what you are talking about; bash does NOT (somehow) scrape
the screen (terminal emulator window).

Furthermore, colouring and bold/underline is quite common not only in
prompts, but also in output directed at a terminal from various programs.
(And it works just fine.) Unfortunately cut-and-paste tends to loose
much (or all) of that. (Would be nicer if it got converted to HTML,
RTF, .doc, or whatever is the target format; or just nicely kept if
"plain text" is the target.)

> That would seem to be at least simpler than a big ol'
> regexp, but really not that much of an improvement.  It also goes to
> show how things like this require all kinds of special handling,
> even/especially in a "simple" shell prompt (which could make a strong
> case for being "plain text", though, yes, terminal escape codes are a
> thing.)

They are NOT "terminal escape codes". It is just that, for now, it is
just about only terminal emulator that implement esc/control-sequences.
>From https://www.ecma-international.org/publications/standards/Ecma-048.htm:
"The control functions are intended to be used embedded in character-coded
data for interchange, in particular with character-imaging devices."
A (plain) text editor is an example of a 'character-imaging device'.
(Yes, the terminology is a bit dated.)

/Kent K

> 
> ~mark

Re: Encoding italic

2019-01-24 Thread Khaled Hosny via Unicode

On Thu, Jan 24, 2019 at 10:42:59PM +, Richard Wordingham via Unicode wrote:
> On Thu, 24 Jan 2019 18:24:07 +0200
> Khaled Hosny via Unicode  wrote:
> 
> > On Thu, Jan 24, 2019 at 03:54:29PM +, Andrew West via Unicode
> > wrote:
> >> On Thu, 24 Jan 2019 at 15:42, James Kass 
> >> wrote:  
> 
> >>> Going off topic a little, I saw this tweet from Marijn van Putten
> >>> today which shows examples of Arabic script from early Quranic
> >>> manuscripts with phonetic information indicated by the use of red
> >>> and green dots:
> >>> 
> >>> https://twitter.com/PhDniX/status/1088171783461703682
>  
> >> I would be interested to know how those should be represented in
> >> Unicode.
>  
> > It is possible to represent this by use of color fonts.
> 
> The limitations of rendering technology should not be an argument
> against an encoding.  We have characters that differ only in their
> properties, such as word-breaking and line-breaking.

They are already encoded, in their modern uncolored form. Some of the
modern forms like U+06E5 ARABIC SMALL WAW, U+06E5 ARABIC SMALL WAW, etc.
were even specifically “invented” in the previous century to overcome
the impracticality of printing in multiple colors, so the colored and
uncolored forms are different representations of the same underlying
characters.
 
> In this case, it may be argued that their colours apply only to their
> 'plain' colouring.  Who determines what their colour should be in blue
> text?  (Font technology seems to dictate that their colour is
> unaffected by the choice of foreground colour.)

The colors don’t change, the vowel marks are always red, the hamza is
always green/yellow.

Re: Encoding italic

2019-01-24 Thread Richard Wordingham via Unicode

On Thu, 24 Jan 2019 18:24:07 +0200
Khaled Hosny via Unicode  wrote:

> On Thu, Jan 24, 2019 at 03:54:29PM +, Andrew West via Unicode
> wrote:
>> On Thu, 24 Jan 2019 at 15:42, James Kass 
>> wrote:  

>>> Going off topic a little, I saw this tweet from Marijn van Putten
>>> today which shows examples of Arabic script from early Quranic
>>> manuscripts with phonetic information indicated by the use of red
>>> and green dots:
>>> 
>>> https://twitter.com/PhDniX/status/1088171783461703682

>> I would be interested to know how those should be represented in
>> Unicode.

> It is possible to represent this by use of color fonts.

The limitations of rendering technology should not be an argument
against an encoding.  We have characters that differ only in their
properties, such as word-breaking and line-breaking.

In this case, it may be argued that their colours apply only to their
'plain' colouring.  Who determines what their colour should be in blue
text?  (Font technology seems to dictate that their colour is
unaffected by the choice of foreground colour.)

Richard.

Re: Encoding italic

2019-01-24 Thread James Kass via Unicode




> Maybe I should have said emoji are fan-driven.

That works.  Here's the previous assertion rephrased:

  We should no more expect the conventional Unicode character encoding
  model to apply to emoji than we should expect the old-fashioned text
  ranges to become fan-driven.

And if we don't want the text ranges to become fan driven, as pointed 
out by Martin Dürst and others, we take a cautious and conservative 
approach to moving forward with the standard.


Veering back on-topic, the anti fan driven aversion doesn't apply to 
encoding italics, although /fans/ would benefit.  There's pre-existing 
conventions for italics, and a scholar with the credentials of Victor 
Gaultney should be able to make a credible proposal for encoding them.  
I hope we haven't overwhelmed him with a surplus of rhetoric.

Re: Encoding italic (was: A last missing link)

2019-01-24 Thread Khaled Hosny via Unicode

On Thu, Jan 24, 2019 at 03:54:29PM +, Andrew West via Unicode wrote:
> On Thu, 24 Jan 2019 at 15:42, James Kass  wrote:
> >
> > Here's a very polite reply from John Hudson from 2000,
> > http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML024/1042.html
> > ...and, over time, many of the replies to William Overington's colorful
> > suggestions were less than polite.  But it was clear that colors were
> > out-of-scope for a computer plain-text encoding standard.
> 
> Going off topic a little, I saw this tweet from Marijn van Putten
> today which shows examples of Arabic script from early Quranic
> manuscripts with phonetic information indicated by the use of red and
> green dots:
> 
> https://twitter.com/PhDniX/status/1088171783461703682
> 
> I would be interested to know how those should be represented in Unicode.

It is possible to represent this by use of color fonts. The green
(sometimes golden) dots are the hamza, the red ones are various vowel
marks. A color font would use colored glyphs for these instead of the
modern shapes. I did a color fonts that does a similar thing (but still
use the modern forms) and it is on my to do list to do a font using
archaic Kufi forms.

Regards,
Khaled

Re: Encoding italic (was: A last missing link)

2019-01-24 Thread Andrew West via Unicode

On Thu, 24 Jan 2019 at 15:42, James Kass  wrote:
>
> Here's a very polite reply from John Hudson from 2000,
> http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML024/1042.html
> ...and, over time, many of the replies to William Overington's colorful
> suggestions were less than polite.  But it was clear that colors were
> out-of-scope for a computer plain-text encoding standard.

Going off topic a little, I saw this tweet from Marijn van Putten
today which shows examples of Arabic script from early Quranic
manuscripts with phonetic information indicated by the use of red and
green dots:

https://twitter.com/PhDniX/status/1088171783461703682

I would be interested to know how those should be represented in Unicode.

Andrew

Re: Encoding italic (was: A last missing link)

2019-01-24 Thread wjgo_10...@btinternet.com via Unicode


Andrew West wrote as follows:

… (note that the colored characters do not change the color of the 
emoji they are attached to [before or after, depending upon whether 
you are speaking French or English dialect of emoji], they are just 
intended as a visual indication of what colour you wish the emoji 
was).


I thought that the idea was that they could possibly be used for glyph 
substitution with an appropriate font, so that there could be, for 
example, a glyph of a polar bear.


I produced a proposal for some characters specifically intended each as 
a colour modifier character.


http://www.unicode.org/L2/L2018/18198-colour-mod-chars.pdf

I know that the document was once on the agenda for a UTC meeting but 
was not mentioned in the minutes, so I do not know whether consideration 
of the best plain text way to express a request for a particular colour 
for an emoji is still taking place and my document is just one of 
several possibilities being considered.


William Overington
Thursday 24 January 2019


-- Original Message --
From: "Andrew West via Unicode" 
To: "Mark E. Shoulson" 
Cc: "Unicode Discussion" 
Sent: Thursday, 2019 Jan 24 At 11:50
Subject: Re: Encoding italic (was: A last missing link)

On Thu, 24 Jan 2019 at 02:10, Mark E. Shoulson via Unicode
 wrote:


Unicode isn't here to encode cool new ideas that would be cool and
new.  It's here for writing what people already do.


http://www.unicode.org/L2/L2018/18141r2-emoji-colors.pdf

"Add 14 colored emoji characters for decorative and/or descriptive
uses. These may be used to indicate that an emoji has a different
color."

No evidence has been provided that anybody is currently using colored
blobs for this purpose (in fact emoji users have explicitly rejected
this method for indicating emoji color:
http://www.unicode.org/L2/L2018/18208-white-wine-rgi.pdf), just an
assertion that it would be a good idea if emoji users could add a
colored swatch to an existing emoji to indicate what color they want
it to represent (note that the colored characters do not change the
color of the emoji they are attached to [before or after, depending
upon whether you are speaking French or English dialect of emoji],
they are just intended as a visual indication of what colour you wish
the emoji was).

This proposal to add 14 additional colored circles, squares and hearts
is a perfect example of a cool new idea for something that the authors
think would be really useful, but for which there is no evidence of
existing use. The UTC should have rejected it as out of scope, but we
all know that rules and procedures do not apply to the Emoji
Subcommittee, so in fact this cool new idea will be included in
Unicode 12 in March.

Andrew

Re: Encoding italic (was: A last missing link)

2019-01-24 Thread wjgo_10...@btinternet.com via Unicode


Mark E. Shoulson wrote:


 It doesn't just take someone saying "out of scope."


It depends who it is. The theory is that people post in the mailing list 
as individuals, yet some people have very great influence.



 It also has to *be* out of scope!


Maybe, it depends who says what.

If someone chants the incantation, but I can persuasively argue that 
no, it IS in scope, then the spell fails.


Well, that may work for you, it does not work for me. Decision is by an 
unnamed gatekeeper and the Unicode Technical Committee does not get to 
discuss it, and discussing whether it is in scope or not is not allowed 
on the mailing list, because discussion of the topic is permanently 
banned.


Requesting the scope of Unicode be widened is not like other 
discussions being had here, so it makes sense that it should be 
treated differently, if treated at all.


Well, it does not make sense to me. If benefit could be produced by 
widening the scope of Unicode in some way, then it seems that it should 
be allowed to be discussed in the mailing list. And even if rejected at 
some time then still be allowed to be discussed at some future time as 
things may have changed.


There were discussions and agreements made as to the scope of Unicode, 
long ago.


Yes. Yet surely decisions made long ago should not lock out all progress 
as new ideas come along.


And just like you can't petition to change a character name, no matter 
how wrong it is, asking the Unicode consortium to redefine itself on 
your say-so is -not going to be taken seriously either.


Well, to me it is not like that. Yes, "a character name, no matter how 
wrong it is," is part of the stability guarantee and cannot be changed. 
Adding U+FFF7 as a base character for a tag digit sequence to uniquely 
and interoperably and stably define a code for a specific meaning for a 
localizable sentence would not, as far as I am aware, break any 
stability guarantees for Unicode. That might widen the scope of Unicode 
or it might be within the present scope, yet either way if it would be 
of benefit to end users then it would be reasonable to consider the idea 
and not block its discussion: and it is not a matter of my say-so at 
all, putting forward an idea for fair consideration is not at all the 
same as dictating that something should be done on someone's say-so. Was 
the scope of Unicode widened for emoji? First of all emoji were encoded 
for compatibility, but the Unicorn Face changed all that and now it an 
annual "could be useful" exercise of generating new characters based on 
people's ideas. For the avoidance of doubt I am not against that at all, 
it is fun and hopefully will continue.


I appreciate that the particular tag sequences to follow U+FFF7 might 
not be encoded by Unicode Inc., they might be encoded by an ISO 
committee, such as ISO/TC 37. Yet encoding U+FFF7 as the base character 
would allow a link as interoperable plain text rather than needing to 
use what amounts to a markup system.


Yet please remember that Unicode Inc. has defined and published base 
character plus tag sequences for the some flags, including the Welsh 
flag and the Scottish flag. Recently I was informed that they are not 
part of The Unicode Standard nor part of ISO/IEC 10646.


It appears that a Unicode Technical Note is being prepared with 
recommendations of how to express teletext control characters using 
Unicode characters, possibly using Escape sequences.


So a Unicode Inc. publication listing numbers and meanings together with 
a context guide for each to help translation of meanings for a 
localization file of code numbers and sentences into a target language 
seems not unreasonable.


As an example, the vertical line used as a separator, as a comma might 
be used within the sentence itself, so not using a  comma as a separator 
of fields.


812|Would you like to go to the day room?

Not all codes would be three digits, some would be longer. Codes where 
the first three digits are all different from the other two digits are 
three digits long. Codes where the first and third digit are the same 
have a length of 3 plus the value of the third digit. So, for example, 
codes starting 313 are six digits long and are a set of localizable 
sentences intended primarily for seeking information through the 
language barrier about relatives and friends after a disaster. The third 
digit being zero allows for even longer code numbers.


Discussing how to change the scope so that whatever-it-is IS in scope 
is a very large undertaking, …


Not necessarily. If the Unicode Technical Committee were to consider a 
proposal and, after consideration and discussion were to agree to 
proceed, it could all be done within a short discussion at a Unicode 
Technical Committee meeting and then the recommendation sent to the ISO 
committee.


I am not saying that it should be or that it will be, I am just trying 
to say that it is not necessarily a very large undertaking. The

Re: Encoding italic (was: A last missing link)

2019-01-24 Thread James Kass via Unicode




Andrew West wrote,

> Why should we not expect the conventional Unicode character encoding
> mode to apply to emoji?

Remember when William Overington used to post about encoding colours, 
sometimes accompanied by novel suggestions about how they could be 
encoded or referenced in plain-text?


Here's a very polite reply from John Hudson from 2000,
http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML024/1042.html
...and, over time, many of the replies to William Overington's colorful 
suggestions were less than polite.  But it was clear that colors were 
out-of-scope for a computer plain-text encoding standard.


So I don't expect the conventional model to apply to emoji because it 
didn't; if it had, they'd not have been encoded.  Since they're in 
there, the conventional model does not apply.  Of course, the 
conventions have changed along with the concept of what's acceptable in 
plain-text.


Since emoji are an open-ended evolving phenomenon, there probably has to 
be a provision for expansion.  Any idea about them having been a finite 
set overlooked the probability of open-endedness and the impracticality 
of having only the original subset covered in plain-text while additions 
would be banished to higher level protocols.


Thank you for the information about current emoji additions being 
unrelated to vendors.  I have to confess that I haven't kept up-to-date 
on the emoji.


Maybe I should have said that emoji are fan-driven.

1 2 >

1 - 100 of 164 matches

Mail list logo