date:20190114

On 2019/01/15 07:58, David Starner via Unicode wrote:
> On Mon, Jan 14, 2019 at 2:09 AM Tex via Unicode  wrote:

>> ·Plain text still has tremendous utility and rich text is not always 
>> an option.
> 
> Where? Twitter has the option of doing rich text, as does any closed
> system. In fact, Twitter is rich text, in that it hyperlinks web
> addresses. That Twitter has chosen not to support italics is a choice.
> If users don't like this, they could go another system, or use
> third-party tools to transmit rich text over Twitter. The use of
> underscores or   markings for italics would be mostly
> compatible with human twitterers using the normal interface.

Yes indeed. Some similar services allow styling. One example is Slack, 
see e.g. 
https://get.slack.help/hc/en-us/articles/202288908-Format-your-messages.

Markdown has been mentioned as an example of how some basic styling 
options (bold, italic,...) can be implemented. Another choice is using 
an user interface component (menu,...). The user then doesn't have to 
care about any 'weird' conventions, even the simplest ones, nor about 
what happens in the background (most probably HTML), and already is 
familiar with it from other applications.

As for implementation complexity, it's not trivial, but there are quite 
a lot of components available, in particular for Web technology. It's 
not rocket science.

Actually, in some cases, it is even difficult to get rid of styling on 
the Web. I recently wanted to print out a map of how to get to a 
restaurant for a party. The restaurant's Web site was all black 
background. I copied the address to Google Maps and then tried to print 
it. Google Maps insists that the first page is just information about 
the location, so I copied the name of the restaurant from the Web page. 
What happened was that it still had the black background. So copy-paste 
on your average Web browser these days doesn't lose styles, even in 
cases where that would be desirable (because more legible).

So rich text technology is already way ahead when it comes to styled 
text. Do we want to encode background-color variant selectors in 
Unicode? If yes, how many?

[Hint: The last two questions are rhetorical.]

Regards,   Martin.

Re: A last missing link for interoperable representation

On 2019/01/15 10:48, Mark E. Shoulson via Unicode wrote:
> On 1/14/19 4:21 PM, Asmus Freytag via Unicode wrote:

>> Short of that, I'm extremely leery of "leading" standardization; that 
>> is, encoding things that "might" be used.
>>
> It is certainly true that Unicode should not be (and wasn't, before 
> emoji)

Just to be precise, as already has been mentioned in this thread, the 
first batch of 'emoji' was in Unicode from the start (e.g. U+2603 
SNOWMAN, there since Unicode 1.1), I think from Zapf Dingbats. The 
second batch came from Japanese phones. So for the first two batches of 
emoji, Unicode did not do any "leading" standardization. It was only 
after that, for later batches, where that happened.

> in the business of encoding things that "could be used", but 
> rather, was for encoding things that *were* used.  This, naturally, 
> poses a chicken-and-egg problem which has been complained about by 
> several people in the past (including me).  Still, there are ways to 
> show that things that haven't been encoded are still being "used", as 
> people make shift to do what they can to use the script/notation, like 
> using PUA or characters that aren't QUITE right, but close...  And in 
> fairness, I'd have to say that the use of mathematical italics would 
> count in that regard.  It's hard to dispute that there is a demand for 
> it, just by looking at how people have been trying to do it!

"a demand" doesn't quantify the demand at all. My guess is that given 
the overall volume of Twitter or Facebook communication, the percentage 
of Math italics (ab)use is really, really low. It's impossible to say 
that there's no demand, but use cases like "look, I found these 
characters, aren't they cute" in some corners of some social services is 
not the same as "we urgently need this, otherwise we can't communicate 
in our language".

Regards,Martin.

Re: A last missing link for interoperable representation


(sorry for multiple responses...)

On 1/13/19 10:00 PM, Martin J. Dürst via Unicode wrote:

On 2019/01/14 01:46, Julian Bradfield via Unicode wrote:

On 2019-01-12, Richard Wordingham via Unicode  wrote:

On Sat, 12 Jan 2019 10:57:26 + (GMT)
And what happens when you capitalise a word for emphasis or to begin a
sentence?  Is it no longer the same word?

Indeed. As has been observed up-thread, the casing idea is a dumb one!
We are, however, stuck with it because of legacy encoding transported
into Unicode. We aren't stuck with encoding fonts into Unicode.

No, the casing idea isn't actually a dumb one. As Asmus has shown, one
of the best ways to understand what Unicode does with respect to text
variants is that style works on spans of characters (words,...), and is
rich text, but thinks that work on single characters are handled in
plain text. Upper-case is definitely for most part a single-character
phenomenon (the recent Georgian MTAVRULI additions being the exception).
Not just an exception, but an exception that proves the rule.  It's 
precisely because plain-text distinctions, generally speaking, should be 
at the letter level as Asmus says that there was so much shouting about 
MTAVRULI.  That these are exceptional demonstrates the existence of the 
rule.

But even most adults won't know the rules for what to italicize that
have been brought up in this thread. Even if they have read books that
use italic and bold in ways that have been brought up in this thread,
most readers won't be able to tell you what the rules are. That's left
to copy editors and similar specialist jobs.
I don't think there's really a case to be made that italics are or 
should work the same as capitals, or that they are justified for the 
same reasons that capitals are justified.  And the use-cases show how 
people are using them: not necessarily for Chicago Manual of Style 
mandated purposes, but for emphasis of varying kinds.

There was a time when computers (and printers in particular) were
single-case. There was some discussion about having to abolish case
distinctions to adapt to computers, but fortunately, that wasn't necessary.
Abolishing case I could see as a hassle, and we have become somewhat 
dependent on it for other things.  But it was a bad idea to start with.



~mark

Re: A last missing link for interoperable representation


On 1/13/19 10:00 PM, Martin J. Dürst via Unicode wrote:

On 2019/01/14 01:46, Julian Bradfield via Unicode wrote:

On 2019-01-12, Richard Wordingham via Unicode  wrote:

On Sat, 12 Jan 2019 10:57:26 + (GMT)
And what happens when you capitalise a word for emphasis or to begin a
sentence?  Is it no longer the same word?

Indeed. As has been observed up-thread, the casing idea is a dumb one!
We are, however, stuck with it because of legacy encoding transported
into Unicode. We aren't stuck with encoding fonts into Unicode.

No, the casing idea isn't actually a dumb one.


Well, for me, when I say or said that the "casing idea" is a dumb one, I 
don't mean how Unicode handled it.  Unicode is quite correct in encoding 
capitals distinctly from lowercase, both for computer-historical reasons 
and others you mention.  I think the idea of having case in alphabets 
_in the first place_ was a bad move.  It's a "mistake" that happened 
centuries ago.


~mark

Re: A last missing link for interoperable representation


  
  
On 1/14/2019 5:41 PM, Mark E. Shoulson
  via Unicode wrote:


  
  On 1/14/19 5:08 AM, Tex via Unicode
wrote:
  
  




  This thread has gone on for a bit and
I question if there is any more light that can be shed.
   
  BTW, I admit to liking Asmus
definition for functions that span text being a definition
or criteria for rich text.
  

  
  Me too.  There are probably some exceptions or weird
corner-cases, but it seems to be a really good encapsulation of
the distinction which I had never seen before.

** blush **
A./

Re: A last missing link for interoperable representation

In some of this discussion, I'm not sure what is being proposed or 
forbidden here... I don't know that anyone is advocating removing the 
"don't use these for words!" warning sticker on the mathematical 
italics.  The closest-to-sensible suggestions I've heard are things like 
a VS to italicize a letter, a combining italicizer so to speak (this is 
actually very similar to the emoji-style vs text-style VS sequences).  
*If* the VS is ignored by searches, as apparently it should be and some 
have reported that it is, then VS-type solutions would NOT be a problem 
when it comes to searches (and don't go whining about legacy software.  
If Unicode had to be backward-compatible with everything we wouldn't 
have gone beyond ASCII).  So I'm not sure what you mean when you speak 
of "Unicode italics".  Do you mean using the mathematical italics as 
we've been seeing?  Or having a whole new plane of italic characters for 
everything that could conceivably be italicized?  Those would probably 
both be mistakes, I agree.


~mark

On 1/14/19 5:58 PM, David Starner via Unicode wrote:

On Mon, Jan 14, 2019 at 2:09 AM Tex via Unicode  wrote:

The arguments against italics seem to be:

·Unicode is plain text. Italics is rich text.

·We haven't had it until now, so we don't need it.

·There are many rich text solutions, such as html.

·There are ways to indicate or simulate italics in plain text including 
using underscore or other characters, using characters that look italic (eg 
math), etc.

·Adding Italicization might break existing software

·The examples of  existing Unicode characters that seem to represent 
rich text (emoji, interlinear annotation, et al) have justifications.

There generally shouldn't be multiple ways of doing things. For
example, if you think that searching for certain text in italics is
important, then having both HTML italics and Unicode italics are going
to cause searches to fail or succeed unexpectedly, unless the
underlying software unifies the two systems (an extra complexity).
Searching for certain italicized text could be done today in rich text
applications, were there actual demand for it.


·Plain text still has tremendous utility and rich text is not always an 
option.

Where? Twitter has the option of doing rich text, as does any closed
system. In fact, Twitter is rich text, in that it hyperlinks web
addresses. That Twitter has chosen not to support italics is a choice.
If users don't like this, they could go another system, or use
third-party tools to transmit rich text over Twitter. The use of
underscores or   markings for italics would be mostly
compatible with human twitterers using the normal interface.

Source code is an example of plain text, and yet adding italics into
comments would require but a trivial change to editors. If the user
audience cared, it would have been done. In fact, I suspect there
exist editors and environments where an HTML subset is put into
comments and rendered by the editors; certainly active links would be
more useful in source code comments than italics.

Lastly, the places where I still find massive use of plain text are
the places this would hurt the most. GNU Grep's manpage shows no sign
that it supports searching under any form of Unicode normalization.
Same with GNU Less. Adding italics would just make searching plain
text documents more complex for their users. The domain name system
would just add them to the ban list, and they'd be used for spoofing
in filenames and other less controlled but still sensitive
environments.

Re: A last missing link for interoperable representation


On 1/14/19 4:21 PM, Asmus Freytag via Unicode wrote:

On 1/14/2019 2:08 AM, Tex via Unicode wrote:


Perhaps the question should be put to twitter, messaging apps, 
text-to-voice vendors, and others whether it will be useful or not.


If the discussion continues I would like to see more of a 
cost/benefit analysis. Where is the harm? What will the benefit to 
user communities be?


The "it does no harm" is never an argument "for" making a change. It's 
something of a necessary, but not a sufficient condition, in other words.


More to the point, if there were platforms (like social media) that 
felt an urgent need to support styling without a markup language, and 
could articulate that need in terms of a proposal, then we would have 
something to discuss. (We might engage them in a discussion of the 
advisability of supporting "markdown", for example).


Short of that, I'm extremely leery of "leading" standardization; that 
is, encoding things that "might" be used.


It is certainly true that Unicode should not be (and wasn't, before 
emoji) in the business of encoding things that "could be used", but 
rather, was for encoding things that *were* used.  This, naturally, 
poses a chicken-and-egg problem which has been complained about by 
several people in the past (including me).  Still, there are ways to 
show that things that haven't been encoded are still being "used", as 
people make shift to do what they can to use the script/notation, like 
using PUA or characters that aren't QUITE right, but close...  And in 
fairness, I'd have to say that the use of mathematical italics would 
count in that regard.  It's hard to dispute that there is a demand for 
it, just by looking at how people have been trying to do it!  So I'm 
starting to think this is not really "leading" standardization, but 
rather following up and, well, standardizing it, replacing ad-hoc 
attempts with a standard way to do things, just as Unicode is supposed 
to do.


~mark


As for the abuse of math alphabetics. That's happening whether we like 
it or not, but at this point represents playful experimentation by the 
exuberant fringe of Unicode users and certainly doesn't need any 
additional extensions.

Re: A last missing link for interoperable representation


On 1/14/19 5:08 AM, Tex via Unicode wrote:


This thread has gone on for a bit and I question if there is any more 
light that can be shed.


BTW, I admit to liking Asmus definition for functions that span text 
being a definition or criteria for rich text.



Me too.  There are probably some exceptions or weird corner-cases, but 
it seems to be a really good encapsulation of the distinction which I 
had never seen before.


~mark

Re: A last missing link for interoperable representation


On 1/14/19 4:45 AM, Martin J. Dürst via Unicode wrote:

Hello James, others,

  From the examples below, it looks like a feature request for Twitter
(and/or Facebook). Blaming the problem on Unicode doesn't seem to be
appropriate.


I think what people here are doing is not blaming the problem on 
Unicode, but rather blaming the _solution_ on Unicode, for better or worse.


~mark

Re: A last missing link for interoperable representation

On Mon, 14 Jan 2019 16:02:05 -0800
Asmus Freytag via Unicode  wrote:

> On 1/14/2019 3:37 PM, Richard Wordingham via Unicode wrote:
> On Tue, 15 Jan 2019 00:02:49 +0100
> Hans Åberg via Unicode  wrote:
> 
> On 14 Jan 2019, at 23:43, James Kass via Unicode
>  wrote:
> 
> Hans Åberg wrote,
>   
> How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́  
> 
> Thought about using a combining accent.  Figured it would just
> display with a dotted circle but neglected to try it out first.  It
> actually renders perfectly here.  /That's/ good to know.  (smile)  
> 
> It is a bit off here. One can try math, too: the derivative of 훾(푡)
> is 훾̇(푡).
> 
> No it isn't.  You should be using a spacing character for
> differentiation. 
> 
> Sorry, but there may be different conventions. The dot / double-dot
> above is definitely common usage in physics.
> 
> A./

Apologies.  It was positioned in the parenthesis, and it looked like a
misplaced U+0301.

Richard.

Re: A last missing link for interoperable representation

On Mon, 14 Jan 2019 06:24:46 +
James Kass via Unicode  wrote:

> Unicode doesn't enforce any spelling or punctuation rules.  Unicode 
> doesn't tell human beings how to pronounce strings of text or how to 
> interpret them.

These are not statements that are both honest and true.  Unicode lays
down rules and recommendations which others may then enforce.

In Indic scripts where LETTER A is not also a consonant, Unicode
forbids writing  where LETTER AA would do the same
job, and most renderers enforce that rule.  Similarly, in phonetically
ordered LTR scripts, one can't write a dependent vowel as the first
character even if it is the leftmost character.

There is a subtler rule about not spelling negative numbers with a
hyphen-minus - if one does, one may suddenly find a line break just
after what is being used as a negative sign.

In scripts where Sanskrit grv and gvr may be rendered identically,
Unicode tells us what the two code sequences are, and therefore
indirectly what the range of pronunciations is for a given spelling.

Now, sometimes the enforcers overstep the mark.  For example, the USE
tells us that when we write Northern Thai /pʰiaʔ/ 'sound of a
smack' which visually is , with 
denoting /ia/, we should write it ᨻ᩠ᨿᩕᩮᩡ .  So much for phonetic order!

Enforcement can be more subtle.  TUS says that Farsi should use U+06CC
ARABIC LETTER FARSI YEH instead of U+064A ARABIC LETTER YEH although
they are identical in initial and medial positions.  In this case, the
enforcer will be the spell-checker.

Richard.

Re: A last missing link for interoperable representation


  
  
On 1/14/2019 2:08 PM, Tex via Unicode
  wrote:


  
  
  
  
Asmus,
 
I
agree 100%. Asking where is the harm was an actual question
intended to surface problems. It wasn’t rhetoric for saying
there is no harm.
  

The harm comes when this is imported into rich text environments
  (like this e-mail inbox). Here, the math abuse and the styled text
  run may look the same, but I cannot search for things based on
  what I see. I see an English or French word, type it in the search
  box and it won't be found. I call that 'stealth' text.
The answer is not necessarily in folding the two, because one of
  the reasons for having math alphabetics is so you can search for a
  variable "a" of  certain kind without getting hits on every "a" in
  the text. Destroying that functionality in an attempt to "solve"
  the problems created by the alternate facsimile of styled text is
  also "harm" in some way.


  

 
Also,
it may not be obvious to social media, messaging platforms,
that there is a possibility of a solution. Often when a
problem exists for a long time, it fades into
unconsciousness. The pain is accepted as that is the way it
is and has to be.
  

A push for (more) universal support of lowest common denominator
  "markdown" would go a long way to support such features in
  environments where SMGL style markup is infeasible and out-of-band
  communication not possible.


  

It
becomes part of the culture. Asking if there is a pain and
whether a solution would be welcomed is consciousness
raising.
 
I
agree about leading standardization. I thought some
legitimate needs were raised. The questions were designed to
quantify the use case as well as the potential damage.
  

Also, treating everything as a character encoding problem is so
  broken.

  

 
I
didn’t think anyone was recommending more math abuse. I
thought it was raised as an example of people resorting to
them as a solution for a need. Of course they are also an
example of playful experimentation.
 
Separately,
Regarding
messaging platforms, although twitter is one example in the
social media space, today there are many business,
commercial, and other applications that embed messaging
capabilities for their communities and for servicing
customers.
I
wouldn’t dismiss the need just based on twitter’s assessment
or on the idea that social media is just for casual or “fun”
use. Clarity of communications can be significant for many
organizations. Having the proposed capabilities in plain
text rather than requiring all of the overhead of a more
rich text solution could be a big win for these apps.
  

I see the math abuse as something that is being done as an
  exercise of playfulness. There are other uses of characters based
  on what they look like, rather than what they mean (or are
  intended for) and much applies to those cases as well.
However, that's independent from making a value judgement on
  social media as such just because some people use the features
  more creatively. That's a judgement that I have neither made nor
  would I be comfortable with it.
A./


  

 
tex
 
 

  
From:
Unicode [mailto:unicode-boun...@unicode.org] On
  Behalf Of Asmus Freytag via Unicode
Sent: Monday, January 14, 2019 1:21 PM
To: unicode@unicode.org
Subject: Re: A last missing link for
interoperable representation
  

 

  On 1/14/2019 2:08 AM, Tex via Unicode
wrote:


  Perhaps the question should be put to
twitter, messaging apps, text-to-voice vendors, and others
whether it will be useful or not.
  If the discussion continues I would
like to see more of a cost/benefit analysis. Where is the
harm? What will the benefit to user communities be?

The
"it does no harm" is never an argument "for" making a
change. It's something of a necessary, but not a sufficient
condition, in other words.
More
to the point, if there were platforms (like social media)
that felt an urgent need to support styling without a

Re: A last missing link for interoperable representation


  
  
On 1/14/2019 2:43 PM, James Kass via
  Unicode wrote:


  
  Hans Åberg wrote,
  
  
  > How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́
  
  
  Thought about using a combining accent.  Figured it would just
  display with a dotted circle but neglected to try it out first. 
  It actually renders perfectly here.  /That's/ good to know. 
  (smile)
  
  
  

While all of this displays fine, it
currently can't be found in the same search that would locate
true italics.
As I am seeing this in an environment that
otherwise supports rich text, the result is "stealth" text.
Stuff that I can read, but not process, without being able to
see a difference.
  
A./

Re: A last missing link for interoperable representation


  
  
On 1/14/2019 3:37 PM, Richard
  Wordingham via Unicode wrote:


  On Tue, 15 Jan 2019 00:02:49 +0100
Hans Åberg via Unicode  wrote:


  

  On 14 Jan 2019, at 23:43, James Kass via Unicode
 wrote:

Hans Åberg wrote,
  

  
How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́  

  
  
Thought about using a combining accent.  Figured it would just
display with a dotted circle but neglected to try it out first.  It
actually renders perfectly here.  /That's/ good to know.  (smile)  



It is a bit off here. One can try math, too: the derivative of 훾(푡)
is 훾̇(푡).

  
  
No it isn't.  You should be using a spacing character for
differentiation. 

Sorry, but there may be different conventions. The dot /
  double-dot above is definitely common usage in physics.

A./




   On the other hand, one uses a combining circumflex
for Fourier transforms.

Richard.

Re: A last missing link for interoperable representation

On Tue, 15 Jan 2019 00:02:49 +0100
Hans Åberg via Unicode  wrote:

> > On 14 Jan 2019, at 23:43, James Kass via Unicode
> >  wrote:
> > 
> > Hans Åberg wrote,
> >   
> > > How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́  
> > 
> > Thought about using a combining accent.  Figured it would just
> > display with a dotted circle but neglected to try it out first.  It
> > actually renders perfectly here.  /That's/ good to know.  (smile)  
> 
> It is a bit off here. One can try math, too: the derivative of 훾(푡)
> is 훾̇(푡).

No it isn't.  You should be using a spacing character for
differentiation.  On the other hand, one uses a combining circumflex
for Fourier transforms.

Richard.

Re: A last missing link for interoperable representation

2019-01-14 Thread Hans Åberg via Unicode




> On 14 Jan 2019, at 23:43, James Kass via Unicode  wrote:
> 
> Hans Åberg wrote,
> 
> > How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́
> 
> Thought about using a combining accent.  Figured it would just display with a 
> dotted circle but neglected to try it out first.  It actually renders 
> perfectly here.  /That's/ good to know.  (smile)

It is a bit off here. One can try math, too: the derivative of 훾(푡) is 훾̇(푡).

Re: A last missing link for interoperable representation


  
  
On 1/14/2019 2:58 PM, David Starner via
  Unicode wrote:


  Source code is an example of plain text, and yet adding italics into
comments would require but a trivial change to editors. If the user
audience cared, it would have been done. In fact, I suspect there
exist editors and environments where an HTML subset is put into
comments and rendered by the editors; certainly active links would be
more useful in source code comments than italics.

Source Insight is a nice and powerful
programming editor that supports rich-text display of source
code, i.e. beyond simple syntax coloring / linkification.
For example, large type for function names.
  
They even support some styling in comments,
but more along the lines of allowing their own markdown
convention that let's you write headings of different levels.
Both to write comments that introduce
sections of your code, as well as headings and subheadings
inside longer comment blocks.
So stuff like that exists, but it's using
semantic markup (style settings per language element) or
markdown (styles in comments).
A./

Re: A last missing link for interoperable representation

2019-01-14 Thread David Starner via Unicode

On Mon, Jan 14, 2019 at 2:09 AM Tex via Unicode  wrote:
> The arguments against italics seem to be:
>
> ·Unicode is plain text. Italics is rich text.
>
> ·We haven't had it until now, so we don't need it.
>
> ·There are many rich text solutions, such as html.
>
> ·There are ways to indicate or simulate italics in plain text 
> including using underscore or other characters, using characters that look 
> italic (eg math), etc.
>
> ·Adding Italicization might break existing software
>
> ·The examples of  existing Unicode characters that seem to represent 
> rich text (emoji, interlinear annotation, et al) have justifications.

There generally shouldn't be multiple ways of doing things. For
example, if you think that searching for certain text in italics is
important, then having both HTML italics and Unicode italics are going
to cause searches to fail or succeed unexpectedly, unless the
underlying software unifies the two systems (an extra complexity).
Searching for certain italicized text could be done today in rich text
applications, were there actual demand for it.

> ·Plain text still has tremendous utility and rich text is not always 
> an option.

Where? Twitter has the option of doing rich text, as does any closed
system. In fact, Twitter is rich text, in that it hyperlinks web
addresses. That Twitter has chosen not to support italics is a choice.
If users don't like this, they could go another system, or use
third-party tools to transmit rich text over Twitter. The use of
underscores or   markings for italics would be mostly
compatible with human twitterers using the normal interface.

Source code is an example of plain text, and yet adding italics into
comments would require but a trivial change to editors. If the user
audience cared, it would have been done. In fact, I suspect there
exist editors and environments where an HTML subset is put into
comments and rendered by the editors; certainly active links would be
more useful in source code comments than italics.

Lastly, the places where I still find massive use of plain text are
the places this would hurt the most. GNU Grep's manpage shows no sign
that it supports searching under any form of Unicode normalization.
Same with GNU Less. Adding italics would just make searching plain
text documents more complex for their users. The domain name system
would just add them to the ban list, and they'd be used for spoofing
in filenames and other less controlled but still sensitive
environments.

-- 
Kie ekzistas vivo, ekzistas espero.

Re: A last missing link for interoperable representation

2019-01-14 Thread James Kass via Unicode




Hans Åberg wrote,

> How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́

Thought about using a combining accent.  Figured it would just display 
with a dotted circle but neglected to try it out first.  It actually 
renders perfectly here.  /That's/ good to know.  (smile)

Re: A last missing link for interoperable representation

2019-01-14 Thread Marcel Schneider via Unicode

On 14/01/2019 08:26, Julian Bradfield via Unicode wrote:

On 2019-01-13, Marcel Schneider via Unicode wrote:

[…]

These statements make me fear that the font you are using might unsupport
the NARROW NO-BREAK SPACE U+202F > <. If you see a question mark between

It displays as a space. As one would expect - I use fixed width fonts
for plain text.

It’s mainly that I suspected you could be using Courier New in the terminal.
It’s default for plain text in main browsers, and there are devices whose
copy of Courier New shows a .notdef box for U+202F. That’s at least what I
ɥnderstood from the feedback, and a test in my browser looked likewise.

these pointy brackets, please let us know. Because then, You’re unable to
read interoperably usable French text, too, as you’ll see double punctuation
(eg "?!") where a single mark is intended, like here !

I see "like here !".

That’s fine, your font has support for . Thanks for reporting.

The reason why I’m anxious to see that checked is that the impact on
implementations of as the group separator is being assessed.

French text does not need narrow spacing any more than science does.
When doing typography, fifty centimetres is $50\thinspace\mathrm{cm}$;
in plain text, 50cm does just fine.

By “plain text” you probably mean *draft style*. I’m thinking that
because "$50\thinspace\mathrm{cm}$" is not less plain text than "50cm".

Indeed, in not understanding that sooner I was an idiot, naively
believing that all Unicode List Members are using Unicode terminology.
Turns out that that cannot be taken for granted any more than knowing
the preferences of French people as of French text display, while not
being a Frenchman:

1. Most French people prefer that big punctunation be spaced off from
the word it pertains to.

2. Most French people strongly dislike punctuation cut off by a line
break, but cannot fix it because:
a) the ordinary keyboard layout has no non-breaking spaces;
b) the readily available on peculiar keyboard layouts
is bugging in most e-mail composers, ending up as breakable.

3. A significant part of French people strongly dislike angle quotes
that are spaced off too far, as it happens when using .

Likewise, normal French people writing email write "Quel idiot!", or
sometimes "Quel idiot !".

Normal people using normal keyboard layouts are writing with the
readily available characters most of the time. This is why (to pick
another example) French people abbreviate “numéro” to "n°", while
on a British English or an American English keyboard layout we
can’t normally expect anything else than "no", or "#" for “Number.”

We’re not trying to keep people off writing fast and draft style.
What in the Unicode era every locale is expected to achieve is to
enable normal users to get the accurate interoperable representation
of their language while typing fast, as opposed to coding in TeX,
which is like using InDesign with system spaces instead of Unicode.
System spaces are not interoperable, nor is LaTeX \thinspace if that
is non-breakable in LaTeX, which it obviously is, since it is used
to represent the thin space between a number and a measurement unit.

In Unicode, as we know it, U+2009 THIN SPACE is breakable, and the
worst thing here is that its duplicate encoding U+2008 PUNCTUATION
SPACE is breakable too, instead of being non-breakable like U+2007
FIGURE SPACE. That is why there was a need to add U+202F NARROW
NO-BREAK SPACE later. (More details in the cited CLDR ticket.)

If you google that phrase on a few French websites, you'll see that
some (such as Larousse, whom one might expect to care about such
things) use no space before punctuation,

Thanks for catching, that flaw shall be reported with link to
your email.

You may also wish to look up this page:
https://communaute.lerobert.com/forum/LE-ROBERT-CORRECTEUR/LE-ROBERT-CORRECTEUR-CORRECTION-D-ORTHOGRAPHE-DICTIONNAIRES-ET-GUIDES/Espace-entre-le-meotet-le-point-d-interrogation/2918628/398261

reading: “Le logiciel Le Robert correcteur justement signale les
espaces fines insécables si elles ne sont pas présentes sur le texte
et propose la correction.” (“Le Robert spellchecker does report
the lack of narrow no-break spaces and proposes to fix it.”)

while others (such as some
random T-shirt company) use an ASCII space.

The Académie Française, which by definition knows more about French
orthography than you do, uses full ASCII spaces before ? and ! on its
front page. Also after opening guillemets, which looks even more
stupid from an Anglophone perspective.

(See point 3 above.) That is a very good point. Indeed this website is
reasonably expected to be an example and a template of correctly
typesetting a French website. There are several reasons why actually it
is not. The main reason is that it is not the work of the A.F. itself,
but of webdesigners, webmasters and content managers, who are normal
people like for any other website. They just haven’t got an

Re: A last missing link for interoperable representation


  
  
On 1/14/2019 2:08 AM, Tex via Unicode
  wrote:


  Perhaps the question should be put to
twitter, messaging apps, text-to-voice vendors, and others
whether it will be useful or not.
  If the discussion continues I would like
to see more of a cost/benefit analysis. Where is the harm? What
will the benefit to user communities be?

The "it does no harm" is never an argument
"for" making a change. It's something of a necessary, but not a
sufficient condition, in other words.
More to the point, if there were platforms
(like social media) that felt an urgent need to support styling
without a markup language, and could articulate that need in
terms of a proposal, then we would have something to discuss.
(We might engage them in a discussion of the advisability of
supporting "markdown", for example).
  
Short of that, I'm extremely leery of
"leading" standardization; that is, encoding things that "might"
be used.
As for the abuse of math alphabetics. That's
happening whether we like it or not, but at this point
represents playful experimentation by the exuberant fringe of
Unicode users and certainly doesn't need any additional
extensions.

Re: A last missing link for interoperable representation

2019-01-14 Thread Marcel Schneider via Unicode


On 14/01/2019 04:00, Martin J. Dürst via Unicode wrote:
[…]

[…] As Asmus has shown, one of the best ways to understand what
Unicode does with respect to text variants is that style works on
spans of characters (words,...), and is rich text, but thinks that
work on single characters are handled in plain text. Upper-case is
definitely for most part a single-character phenomenon (the recent
Georgian MTAVRULI additions being the exception).


Obviously the single-character rule also applies to superscript when
used as ordinal indicator or more generally, as abbreviation indicator.

Thanks for the hint, it’s all about interoperability and in this case
too the point in using preformatted characters is a good one IIUC.

Sorry for getting a little off-topic. There’s also one reply on my
to-do list where I’ll do even more so; can’t help given it’s our
digital representation that’s at stake, and due to past neglect on
either side there’s still a need to painfully lobby for each
character while so many other important issues are out there…

Best Regards,

Marcel

Re: A last missing link for interoperable representation

2019-01-14 Thread Hans Åberg via Unicode



> On 13 Jan 2019, at 22:43, Khaled Hosny via Unicode  
> wrote:
> 
> LaTeX with the
> “unicode-math” package will translate ASCII + font switches to the
> respective Unicode math alphanumeric characters. Word will do the same.
> Even browsers rendering MathML will do the same (though most likely the
> MathML source will have the math alphanumeric characters already).

For full translation, one probably has to use ConTexT and LuaTeX. Then, along 
with PDF, one can also generate HTML with MathML.

Re: A last missing link for interoperable representation

2019-01-14 Thread Hans Åberg via Unicode



> On 14 Jan 2019, at 06:08, James Kass via Unicode  wrote:
> 
> 퐴푟푡 푛표푢푣푒푎푢 seems a bit 푝푎푠푠é nowadays, as well.
> 
> (Had to use mark-up for that “span” of a single letter in order to indicate 
> the proper letter form.  But the plain-text display looks crazy with that 
> HTML jive in it.)

How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́

Re: A last missing link for interoperable representation

2019-01-14 Thread James Kass via Unicode




Hello Martin, others...

> Blaming the problem on Unicode doesn't seem to be appropriate.

I don't consider that there's any problem with plain text users 
exchanging plain text.  I give Unicode /credit/ for being the foundation 
of that ability.  Anyone imagining that I'm casting blame is under a 
misconception.


There's plain text data out there stringing math alphanumerics into 
recognizable words.  It's being stored and shared and indexed.  I have 
no problem with that; I'm in favor of it.


(Everyone, please let's focus on Tex Texin's latest post.  Wish I'd sent 
this post before his...)


Best regards,

James Kass

RE: A last missing link for interoperable representation

2019-01-14 Thread Tex via Unicode

This thread has gone on for a bit and I question if there is any more light 
that can be shed.

 

BTW, I admit to liking Asmus definition for functions that span text being a 
definition or criteria for rich text.

I also liked James examples of the twitter use case.

 

The arguments against italics seem to be:

·Unicode is plain text. Italics is rich text.

·We haven't had it until now, so we don't need it.

·There are many rich text solutions, such as html.

·There are ways to indicate or simulate italics in plain text including 
using underscore or other characters, using characters that look italic (eg 
math), etc.

·Adding Italicization might break existing software

·The examples of  existing Unicode characters that seem to represent 
rich text (emoji, interlinear annotation, et al) have justifications.

 

The case for it are:

·Plain text still has tremendous utility and rich text is not always an 
option.

·Simulations for italics are non-standard and therefore hurt 
interoperability. This includes math characters not being supported 
universally, underscore and other indicators are not a standard, nor are 
alternative fonts.

·There are legitimate needs for a standardized approach for 
interchange, accessibility (e.g. screen readers), search, twitter, et al. 

·Evidence of the demand is perhaps demonstrated by the number of 
simulations, and the requests for how to implement it to vendors of plain text 
apps (such as twitter).

·Supporting italics can be implemented without breaking existing 
documents and should be easily supported in modern Unicode apps.

·The impact on the standard for adding a character for italics (and 
another for bold and perhaps a couple others) is miniscule as it fits into the 
VS model.

·The argument that italics is rich text is an ideological one. However, 
as with other examples, there are cases where practicality should win out.

·This isn’t a slippery slope.

 

Personally, I think the cost seems very low, both to the standard and to 
implementers. I don’t see a lot of risk that it will break apps. (At least not 
those that wouldn’t be broken by VS or other features in the standard.)

It will help many apps.

I think the benefits to interoperability, accessibility, search, 
standardization of text are significant.

 

Perhaps the question should be put to twitter, messaging apps, text-to-voice 
vendors, and others whether it will be useful or not.

If the discussion continues I would like to see more of a cost/benefit 
analysis. Where is the harm? What will the benefit to user communities be?

 

tex

Re: A last missing link for interoperable representation

Hello James, others,

On 2019/01/14 15:24, James Kass via Unicode wrote:
> 
> Martin J. Dürst wrote,
> 
>  > I'd say it should be conservative. As the meaning of that word
>  > (similar to others such as progressive and regressive) may be
>  > interpreted in various way, here's what I mean by that.
>  >
>  > It should not take up and extend every little fad at the blink of an
>  > eye. It should wait to see what the real needs are, and what may be
>  > just a temporary fad. As the Mathematical style variants show, once
>  > characters are encoded, it's difficult to get people off using them,
>  > even in ways not intended.
> 
> A conservative approach to progress is a sensible position for computer 
> character encoders.  Taking a conservative approach doesn't necessarily 
> mean being anti-progress.
> 
> Trying to "get people off" using already encoded characters, whether or 
> not the encoded characters are used as intended, might give an 
> impression of being anti-progress.

Using the expression "get people off" was indeed somewhat ambiguous. Of 
course we cannot forbid people to use Mathematical alphanumerics. 
There's no standards police, neither for Unicode nor most other standards.

> Unicode doesn't enforce any spelling or punctuation rules.  Unicode 
> doesn't tell human beings how to pronounce strings of text or how to 
> interpret them.  Unicode doesn't push any rules about splitting 
> infinitives or conjugating verbs.
> 
> Unicode should not tell people how any written symbol must be 
> interpreted.  Unicode should not tell people how or where to deploy 
> their own written symbols.

Yes. But Unicode can very well say: These characters are for Math, and 
if you use them for anything else, that's your problem, and because they 
are used for Math, they support what's used in Math, and we won't add 
copies of accented characters or variant characters for style or [your 
proposal goes here] because that's not what Unicode is about. If you 
want real styling, then use applications that can do that, or try to 
convince your application provider to provide that.

(Well, Unicode is more or less saying just exactly that currently.)

And that's what I meant with "getting people off". If that then leads to 
less people (mis)using these characters, all the better.

> Perhaps fraktur is frivolous in English text.  Perhaps its use would 
> result in a new convention for written English which would enhance the 
> literary experience.  Italics conventions which have only been around a 
> hundred years or so may well turn out to be just a passing fad, so we 
> should probably give it a bit more time.

There's no need to give italic conventions more time. Of course they may 
die out, but they are very active now. And they are very actively 
supported in rich text, where they belong.

> Telling people they mustn't use Latin italics letter forms in computer 
> text while we wait to see if the practice catches on seems flawed in 
> concept.

The practice is already there. Lots of people use italics in rich text. 
That's just fine because that's the right thing to do. We don't need to 
muddy the waters.

Regards,   Martin.

Re: A last missing link for interoperable representation

Hello James, others,

 From the examples below, it looks like a feature request for Twitter 
(and/or Facebook). Blaming the problem on Unicode doesn't seem to be 
appropriate.

Regards,   Martin.

On 2019/01/14 18:06, James Kass via Unicode wrote:
> 
> Not a twitter user, don't know how popular the practice is, but here's a 
> couple of links concerned with how to use bold or italics in Twitter 
> plain text messages.
> 
> https://www.simplehelp.net/2018/03/13/how-to-use-bold-and-italicized-text-on-twitter/
>  
> 
> https://mothereff.in/twitalics
> 
> Both pages include a form of caveat.  But the caveat isn't about the 
> intended use of the math alphanumerics.
> 
> The first page includes the following text as part of a "tweet":
> Just because you 헰헮헻 doesn’t mean you 혴혩혰혶혭혥 :)
> 
> And, as before, I have no idea how /popular/ the practice is.  But 
> here's some more links:
> 
> (web page from 2013)
> How To Write In Italics, Tweet Backwards And Use Lots Of Different ...
> https://www.adweek.com/digital/twitter-font-italics-backwards/
> 
> (This is copy/pasted *as-is* from the web page to plain-text)
> Bold and Italic Unicode Text Tool - 퐁퐨퐥퐝 풂풏풅 푖푡푎푙푖푐푠 - 
> YayText
> https://yaytext.com/bold-italic/
> Super cool unicode text magic. Write 퐛퐨퐥퐝 and/or 푖푡푎푙푖푐 
> updates on Facebook, Twitter, and elsewhere. Bold (serif) preview copy 
> tweet.
> 
> Michael Maurino [emoji redacted-JK] on Twitter: "Can I make italics on 
> twitter? 'cause ...
> https://twitter.com/iron_stylus/status/281991180064022528?lang=en
> 
> Charlie Brooker on Twitter: "How do you do italics on this thing again?"
> https://twitter.com/charltonbrooker/status/484623185862983680?lang=en
> 
> How to make your Facebook and Twitter text bold or italic, and other ...
> https://boingboing.net/2016/04/10/yaytext-unicode-text-styling.html
> Apr 10, 2016 - For years I've been using the Panix Unicode Text 
> Converter to create ironic, weird or simply annoying text effects for 
> use on Twitter, Facebook ...
> 
> How to change your Twitter font | Digital Trends
> https://www.digitaltrends.com/.../now-you-can-use-bold-italics-and-other-fancy-fonts-...
>  
> 
> Aug 14, 2013 - now you can use bold italics and other fancy fonts on 
> twitter isaac ... or phrase into your Twitter text box, and there you 
> have it: fancy tweets.
> 
> Twitter Fonts Generator (퓬퓸퓹픂 퓪퓷퓭 퓹퓪퓼퓽퓮) ― LingoJam
> https://lingojam.com/TwitterFonts
> You might have noticed that some users on Twitter are able to change the 
> font ... them to seemingly make their tweet font bold, italic, or just 
> completely different.
>

Re: A last missing link for interoperable representation

2019-01-14 Thread James Kass via Unicode

Not a twitter user, don't know how popular the practice is, but here's a
couple of links concerned with how to use bold or italics in Twitter
plain text messages.

https://www.simplehelp.net/2018/03/13/how-to-use-bold-and-italicized-text-on-twitter/
https://mothereff.in/twitalics

Both pages include a form of caveat. But the caveat isn't about the
intended use of the math alphanumerics.

The first page includes the following text as part of a "tweet":
Just because you 헰헮헻 doesn’t mean you 혴혩혰혶혭혥 :)

And, as before, I have no idea how /popular/ the practice is. But
here's some more links:

(web page from 2013)
How To Write In Italics, Tweet Backwards And Use Lots Of Different ...
https://www.adweek.com/digital/twitter-font-italics-backwards/

(This is copy/pasted *as-is* from the web page to plain-text)
Bold and Italic Unicode Text Tool - 퐁퐨퐥퐝 풂풏풅 푖푡푎푙푖푐푠 - YayText
https://yaytext.com/bold-italic/
Super cool unicode text magic. Write 퐛퐨퐥퐝 and/or 푖푡푎푙푖푐
updates on Facebook, Twitter, and elsewhere. Bold (serif) preview copy
tweet.

Michael Maurino [emoji redacted-JK] on Twitter: "Can I make italics on
twitter? 'cause ...

https://twitter.com/iron_stylus/status/281991180064022528?lang=en

Charlie Brooker on Twitter: "How do you do italics on this thing again?"
https://twitter.com/charltonbrooker/status/484623185862983680?lang=en

How to make your Facebook and Twitter text bold or italic, and other ...
https://boingboing.net/2016/04/10/yaytext-unicode-text-styling.html
Apr 10, 2016 - For years I've been using the Panix Unicode Text
Converter to create ironic, weird or simply annoying text effects for
use on Twitter, Facebook ...

How to change your Twitter font | Digital Trends
https://www.digitaltrends.com/.../now-you-can-use-bold-italics-and-other-fancy-fonts-...
Aug 14, 2013 - now you can use bold italics and other fancy fonts on
twitter isaac ... or phrase into your Twitter text box, and there you
have it: fancy tweets.

Twitter Fonts Generator (퓬퓸퓹픂 퓪퓷퓭 퓹퓪퓼퓽퓮) ― LingoJam
https://lingojam.com/TwitterFonts
You might have noticed that some users on Twitter are able to change the
font ... them to seemingly make their tweet font bold, italic, or just
completely different.

Re: A last missing link for interoperable representation