Re: A last missing link for interoperable representation

2019-01-13 Thread James Kass via Unicode



Julian Bradfield wrote,

> I have never seen a Unicode math alphabet character in email
> outside this list.

It's being done though.  Check this message from 2013 which includes the 
following, copy/pasted from the web page into Notepad:


혗혈혙혛 혖혍 헔햳햮헭.향햱햠햬햤햶햮햱햪  © ퟮퟬퟭퟯ 햠햫햤햷 햦햱햠햸  
헀헂헍헁헎햻.햼허헆/헺헿헮헹헲혅헴헿헮혆


https://apple.stackexchange.com/questions/104159/what-are-these-characters-and-how-can-i-use-them



Re: A last missing link for interoperable representation

2019-01-13 Thread Julian Bradfield via Unicode
On 2019-01-13, James Kass via Unicode  wrote:
> यदि आप किसी रोटरी फोन से कॉल कर रहे हैं, तो कृपया स्टार (*) दबाएं।

> What happens with Devanagari text?  Should the user community refrain 
> from interchanging data because 1980s era software isn't Unicode aware?

Devanagari is an established writing system (which also doesn't need
separate letters for different typefaces). Those who wish to exchange
information in devanagari will use either an ISCII or Unicode system
with suitable font support.
Just as those who wish to exchange English text with typographic
detail will use a suitable typographic mark-up system with font
support, which will typically not interfere with plain text searching.
Even in a PDF document, "art nouveau" will appear as "art nouveau"
whatever font it's in.

Incidentally, a large chunk of my facebook feed is Indian politics,
and of that portion of it that is in Hindi or other Indian
languages, most is still written in ASCII transcription, even though
every web browser and social media application in common use surely
has full Unicode support these days. Sometimes using your own writing
system is just too much effort!

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-13 Thread Julian Bradfield via Unicode
On 2019-01-14, James Kass via Unicode  wrote:
> 퐴푟푡 푛표푢푣푒푎푢 seems a bit 푝푎푠푠é nowadays, as well.
>
> (Had to use mark-up for that “span” of a single letter in order to 
> indicate the proper letter form.  But the plain-text display looks crazy 
> with that HTML jive in it.)

Indeed. But
 _Art nouveau_ seems a bit _passé_ nowadays
looks fine and is understood even by those who have never annotated a
manuscript with proof corrections.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-13 Thread Julian Bradfield via Unicode
On 2019-01-13, Marcel Schneider via Unicode  wrote:
> As far as the information goes that was running until now on this List,
> Mathematicians are both using TeX and liking the Unicode math alphabets.

As Khaled has said, if they use them, it's because some software
designer has decided to use them to implement markup.
I have never seen a Unicode math alphabet character in email outside
this list.

> These statements make me fear that the font you are using might unsupport
> the NARROW NO-BREAK SPACE U+202F > <. If you see a question mark between

It displays as a space. As one would expect - I use fixed width fonts
for plain text.

> these pointy brackets, please let us know. Because then, You’re unable to
> read interoperably usable French text, too, as you’ll see double punctuation
> (eg "?!") where a single mark is intended, like here !

I see "like here !".
French text does not need narrow spacing any more than science does.
When doing typography, fifty centimetres is $50\thinspace\mathrm{cm}$;
in plain text, 50cm does just fine.
Likewise, normal French people writing email write "Quel idiot!", or
sometimes "Quel idiot !".

If you google that phrase on a few French websites, you'll see that
some (such as Larousse, whom one might expect to care about such
things) use no space before punctuation, while others (such as some
random T-shirt company) use an ASCII space.

The Académie Française, which by definition knows more about French
orthography than you do, uses full ASCII spaces before ? and ! on its
front page. Also after opening guillemets, which looks even more
stupid from an Anglophone perspective.

> Aiming at extending the subset of environments supporting correct typesetting

There are many fine programs, including TeX, for doing good
typesetting. Unicode is not about typesetting, it's about information
exchange and preservation.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-13 Thread James Kass via Unicode



Martin J. Dürst wrote,

> I'd say it should be conservative. As the meaning of that word
> (similar to others such as progressive and regressive) may be
> interpreted in various way, here's what I mean by that.
>
> It should not take up and extend every little fad at the blink of an
> eye. It should wait to see what the real needs are, and what may be
> just a temporary fad. As the Mathematical style variants show, once
> characters are encoded, it's difficult to get people off using them,
> even in ways not intended.

A conservative approach to progress is a sensible position for computer 
character encoders.  Taking a conservative approach doesn't necessarily 
mean being anti-progress.


Trying to "get people off" using already encoded characters, whether or 
not the encoded characters are used as intended, might give an 
impression of being anti-progress.


Unicode doesn't enforce any spelling or punctuation rules.  Unicode 
doesn't tell human beings how to pronounce strings of text or how to 
interpret them.  Unicode doesn't push any rules about splitting 
infinitives or conjugating verbs.


Unicode should not tell people how any written symbol must be 
interpreted.  Unicode should not tell people how or where to deploy 
their own written symbols.


Perhaps fraktur is frivolous in English text.  Perhaps its use would 
result in a new convention for written English which would enhance the 
literary experience.  Italics conventions which have only been around a 
hundred years or so may well turn out to be just a passing fad, so we 
should probably give it a bit more time.


Telling people they mustn't use Latin italics letter forms in computer 
text while we wait to see if the practice catches on seems flawed in 
concept.




RE: A last missing link for interoperable representation

2019-01-13 Thread Tex via Unicode


"Looking back at the history of computing, a large chunk of the
underlying technology has hit stability. ARM chips, x86 chips, Unix,
and Windows have all been around since 1985 or before, roughly 35
years ago and 35 years since the first programmed computer. They
aren't wildly changing."

I would encourage you to return to a system of 35 years ago, if you believe 
they are the same.

Performance, pipeline, memory access, device support, graphical capabilities, 
underlying instructions, security features...

One could argue the wheel is medieval and still works today, but the wheels I 
drive on are designed for a variety of weather conditions, traction, minimal 
noise generation, light weight with durability and high performance, and are 
particular to the front or back axle. And I know from experience the wrong 
wheels can spin me around and ram me into a median...

tex









RE: A last missing link for interoperable representation

2019-01-13 Thread Tex via Unicode
> But even most adults won't know the rules for what to italicize that 
> have been brought up in this thread. Even if they have read books that 
> use italic and bold in ways that have been brought up in this thread, 
> most readers won't be able to tell you what the rules are. That's left 
> to copy editors and similar specialist jobs.

Most adults don't know the right places to soft-hyphenate a word, and yet we 
support that in plain-text.
They also don't know the differences between the various dashes and spaces and 
when to use each.
Literacy isn't an appropriate criteria.  Even the apostrophe fails that test 
since so many people fail to distinguish its from it's and there from they're. 
:-)


> There was a time when computers (and printers in particular) were 
> single-case. There was some discussion about having to abolish case 
> distinctions to adapt to computers, but fortunately, that wasn't necessary.

Ironic to mention the example of the failure of technology to support 
linguistic requirements driving a proposal to limit the attributes of language.
As you say it was fortunate it wasn't necessary then...
It makes the case for the importance of improving technology to support 
fundamental language attributes.

tex











Re: A last missing link for interoperable representation

2019-01-13 Thread James Kass via Unicode



Marcel Schneider wrote,

> There is a crazy typeface out there, misleadingly called 'Courier New',
> as if the foundry didn’t anticipate that at some point it would be better
> called "Courier Obsolete". ...

퐴푟푡 푛표푢푣푒푎푢 seems a bit 푝푎푠푠é nowadays, as well.

(Had to use mark-up for that “span” of a single letter in order to 
indicate the proper letter form.  But the plain-text display looks crazy 
with that HTML jive in it.)




Re: A last missing link for interoperable representation

2019-01-13 Thread David Starner via Unicode
On Sun, Jan 13, 2019 at 7:03 PM Martin J. Dürst via Unicode
 wrote:
> No, the casing idea isn't actually a dumb one. As Asmus has shown, one
> of the best ways to understand what Unicode does with respect to text
> variants is that style works on spans of characters (words,...), and is
> rich text, but thinks that work on single characters are handled in
> plain text. Upper-case is definitely for most part a single-character
> phenomenon (the recent Georgian MTAVRULI additions being the exception).

I would disagree; upper case is normally used in all caps or
title-case, and the latter is used on a word, not a character.

I don't argue that Unicode is wrong for handling casing the way it
does, but it does massively complicate the processing of any Latin
text; virtually all searches should be case-insensitive, for example.
At least in English, computerized casing will always be problematic.

> UPPER CASE can be used on whole spans of text, but that's not the main
> use case. And if UPPER CASE is used for emphasis, one way to do it (and
> the best way if this is actually a styling issue) is to use rich text
> and mark it up according to semantics, and then use some styling
> directive (e.g. CSS text-transform: uppercase) to get the desired look.

That's an example of how having multiple systems makes things more
complex and less consistent. If something can be written as all upper
case with the caps lock key, it will be. If a generated HTML file can
have uppercase added with a Python or SQL function, it probably will
be. Using CSS text-transform may be best practice, but simpler plain
text solutions will be used in a lot of cases and nothing can be
extrapolated clearly from its use or lack of use.

-- 
Kie ekzistas vivo, ekzistas espero.



Re: A last missing link for interoperable representation

2019-01-13 Thread David Starner via Unicode
On Sat, Jan 12, 2019 at 8:26 PM James Kass via Unicode
 wrote:
> It's subjective, really.  It depends on how one views plain-text and
> one's expectations for its future.  Should plain-text be progressive,
> regressive, or stagnant?  Because those are really the only choices.
> And opinions differ.
>
> Most of us involved with Unicode probably expect plain-text to be around
> for quite a while.  The figure bandied about in the past on this list is
> "a thousand years".  Only a society of mindless drones would cling to
> the past for a millennium.  So, many of us probably figure that
> strictures laid down now will be overridden as a matter of course, over
> time.

And yet you write this in the Latin script that's been around for a
couple millennia. Arabic, Han ideographs, Cyrillic and Devanagari have
all been around a millennia.

Looking back at the history of computing, a large chunk of the
underlying technology has hit stability. ARM chips, x86 chips, Unix,
and Windows have all been around since 1985 or before, roughly 35
years ago and 35 years since the first programmed computer. They
aren't wildly changing. Unicode is moving towards that position; it
does a job and doesn't need disrupt changes to continue to be
relevant.

> Unicode will probably be around for awhile, but the barrier between
> plain- and rich-text has already morphed significantly in the relatively
> short period of time it's been around.

Fixed pictures have been parts of character sets for decades and were
part of Unicode 1.1. U+2704, WHITE SCISSORS, for example. And emoji
aren't disruptive in the way that moving something that's been a part
of the rich-text layer forever into the plain-text layer.

> I became attracted to Unicode about twenty years ago.  Because Unicode
> opened up entire /realms/ of new vistas relating to what could be done
> with computer plain text.  I hope this trend continues.

The right tool for the job. If you need rich text, you should use rich
text. Emoji had to make the case that they were being used as
characters and there were no competing tools to handle them.

-- 
Kie ekzistas vivo, ekzistas espero.


Re: A last missing link for interoperable representation

2019-01-13 Thread Martin J . Dürst via Unicode
On 2019/01/14 01:46, Julian Bradfield via Unicode wrote:
> On 2019-01-12, Richard Wordingham via Unicode  wrote:
>> On Sat, 12 Jan 2019 10:57:26 + (GMT)

>> And what happens when you capitalise a word for emphasis or to begin a
>> sentence?  Is it no longer the same word?
> 
> Indeed. As has been observed up-thread, the casing idea is a dumb one!
> We are, however, stuck with it because of legacy encoding transported
> into Unicode. We aren't stuck with encoding fonts into Unicode.

No, the casing idea isn't actually a dumb one. As Asmus has shown, one 
of the best ways to understand what Unicode does with respect to text 
variants is that style works on spans of characters (words,...), and is 
rich text, but thinks that work on single characters are handled in 
plain text. Upper-case is definitely for most part a single-character 
phenomenon (the recent Georgian MTAVRULI additions being the exception).

UPPER CASE can be used on whole spans of text, but that's not the main 
use case. And if UPPER CASE is used for emphasis, one way to do it (and 
the best way if this is actually a styling issue) is to use rich text 
and mark it up according to semantics, and then use some styling 
directive (e.g. CSS text-transform: uppercase) to get the desired look.


Another criterion is orthography. Schoolchildren learn when to 
capitalize a word and when not. Teachers check and correct it all the 
time. Grammar books and books for second language learners discuss 
capitalization, because it's part of orthography, the rules differ by 
language, and not getting it right will make the writer look bad.

But even most adults won't know the rules for what to italicize that 
have been brought up in this thread. Even if they have read books that 
use italic and bold in ways that have been brought up in this thread, 
most readers won't be able to tell you what the rules are. That's left 
to copy editors and similar specialist jobs.

There was a time when computers (and printers in particular) were 
single-case. There was some discussion about having to abolish case 
distinctions to adapt to computers, but fortunately, that wasn't necessary.

Regards,   Martin.



Re: A last missing link for interoperable representation

2019-01-13 Thread James Kass via Unicode



Julian Bradfield replied,

>> Sounds like you didn't try it.  VS characters are default ignorable.
>
> By software that has a full understanding of Unicode. There is a very
> large world out there of software that was written before Unicode was
> dreamed of, let alone popular.

यदि आप किसी रोटरी फोन से कॉल कर रहे हैं, तो कृपया स्टार (*) दबाएं।

What happens with Devanagari text?  Should the user community refrain 
from interchanging data because 1980s era software isn't Unicode aware?




Re: A last missing link for interoperable representation

2019-01-13 Thread Khaled Hosny via Unicode
On Sun, Jan 13, 2019 at 04:52:25PM +, Julian Bradfield via Unicode wrote:
> On 2019-01-12, James Kass via Unicode  wrote:
> > This is an italicized word:
> > 푘푎푘푖푠푡표푐푟푎푐푦
> > ... where the "geek" hacker used Latin italics letters from the math 
> > alphanumeric range as though they were Latin italics letters.
> 
> It's a sequence of question marks unless you have an up to date
> Unicode font set up (which, as it happens, I don't for the terminal in
> which I read this mailing list). Since actual mathematicians don't use
> the Unicode math alphabets, there's no strong incentive to get updated
> fonts.

They do, but not necessarily by directly inputting them. LaTeX with the
“unicode-math” package will translate ASCII + font switches to the
respective Unicode math alphanumeric characters. Word will do the same.
Even browsers rendering MathML will do the same (though most likely the
MathML source will have the math alphanumeric characters already).

Regards,
Khaled


Re: A last missing link for interoperable representation

2019-01-13 Thread Marcel Schneider via Unicode

On 13/01/2019 17:52, Julian Bradfield via Unicode wrote:

On 2019-01-12, James Kass via Unicode  wrote:

This is a math formula:
a + b = b + a
... where the estimable "mathematician" used Latin letters from ASCII as
though they were math alphanumerics variables.


Yup, and it's immediately understandable by anyone reading on any
computer that understands ASCII.  That's why mathematicians write like
that in plain text.


As far as the information goes that was running until now on this List,
Mathematicians are both using TeX and liking the Unicode math alphabets.




This is an italicized word:
푘푎푘푖푠푡표푐푟푎푐푦
... where the "geek" hacker used Latin italics letters from the math
alphanumeric range as though they were Latin italics letters.


It's a sequence of question marks unless you have an up to date
Unicode font set up (which, as it happens, I don't for the terminal in
which I read this mailing list). Since actual mathematicians don't use
the Unicode math alphabets, there's no strong incentive to get updated
fonts.


These statements make me fear that the font you are using might unsupport
the NARROW NO-BREAK SPACE U+202F > <. If you see a question mark between
these pointy brackets, please let us know. Because then, You’re unable to
read interoperably usable French text, too, as you’ll see double punctuation
(eg "?!") where a single mark is intended, like here !

There is a crazy typeface out there, misleadingly called 'Courier New', as if
the foundry didn’t anticipate that at some point it would be better called
"Courier Obsolete". Or they did, but… (Referring to CLDR ticket #11423.)

BTW if anybody knows a version of Courier New updated to a decent level of
Unicode support, please be so kind and share the link so I can spread the word.




Where's the harm?


You lose your audience for no reasons other than technogeekery.


Aiming at extending the subset of environments supporting correct typesetting
is no geekery but awareness of our cultural heritage that we’re committed to
maintain and to develop, taking it over into the digital world while adapting
technology to culture, not conversely.


Best regards,

Marcel


Re: A last missing link for interoperable representation

2019-01-13 Thread Julian Bradfield via Unicode
On 2019-01-12, James Kass via Unicode  wrote:
> This is a math formula:
> a + b = b + a
> ... where the estimable "mathematician" used Latin letters from ASCII as 
> though they were math alphanumerics variables.

Yup, and it's immediately understandable by anyone reading on any
computer that understands ASCII.  That's why mathematicians write like
that in plain text.

> This is an italicized word:
> 푘푎푘푖푠푡표푐푟푎푐푦
> ... where the "geek" hacker used Latin italics letters from the math 
> alphanumeric range as though they were Latin italics letters.

It's a sequence of question marks unless you have an up to date
Unicode font set up (which, as it happens, I don't for the terminal in
which I read this mailing list). Since actual mathematicians don't use
the Unicode math alphabets, there's no strong incentive to get updated
fonts.

> Where's the harm?

You lose your audience for no reasons other than technogeekery. 


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-13 Thread Julian Bradfield via Unicode
On 2019-01-12, Richard Wordingham via Unicode  wrote:
> On Sat, 12 Jan 2019 10:57:26 + (GMT)
> Julian Bradfield via Unicode  wrote:
>
>> It's also fundamentally misguided. When I _italicize_ a word, I am
>> writing a word composed of (plain old) letters, and then styling the
>> word; I am not composing a new and different word ("_italicize_") that
>> is distinct from the old word ("italicize") by virtue of being made up
>> of different letters.
>
> And what happens when you capitalise a word for emphasis or to begin a
> sentence?  Is it no longer the same word?

Indeed. As has been observed up-thread, the casing idea is a dumb one!
We are, however, stuck with it because of legacy encoding transported
into Unicode. We aren't stuck with encoding fonts into Unicode.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-13 Thread Julian Bradfield via Unicode
On 2019-01-12, James Kass via Unicode  wrote:

> Sounds like you didn't try it.  VS characters are default ignorable.

By software that has a full understanding of Unicode. There is a very
large world out there of software that was written before Unicode was
dreamed of, let alone popular.

> apricot
> a︁p︁r︁i︁c︁o︁t︁
> Notepad finds them both if you type the word "apricot" into the search box.

What has Notepad to do with me?

> "But for plain text, it's crazy."
>
> Are you a member of the plain-text user community?

Certainly:)

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-13 Thread Martin J . Dürst via Unicode
On 2019/01/13 13:24, James Kass via Unicode wrote:
> 
> Mark E. Shoulson wrote,
> 
>  > This discussion has been very interesting, really.  I've heard what I
>  > thought were very good points and relevant arguments from both/all
>  > sides, and I confess to not being sure which I actually prefer.
> 
> It's subjective, really.  It depends on how one views plain-text and 
> one's expectations for its future.  Should plain-text be progressive, 
> regressive, or stagnant?  Because those are really the only choices. And 
> opinions differ.

I'd say it should be conservative. As the meaning of that word (similar 
to others such as progressive and regressive) may be interpreted in 
various way, here's what I mean by that.

It should not take up and extend every little fad at the blink of an 
eye. It should wait to see what the real needs are, and what may be just 
a temporary fad. As the Mathematical style variants show, once 
characters are encoded, it's difficult to get people off using them, 
even in ways not intended.

Emoji have often been often cited in this thread. But there are some 
important observations:

1) Emoji were added to Unicode only after it turned out that they were
widely used in Japanese character encodings, and dripping into
Unicode-based systems in large numbers but without any clearly
assigned code points. The Unicode Consortium didn't start encoding
them because they thought emoji were cute or progressive or anything
like that.

2) The Unicode Consortium is continuing to hold down the number of newly
encoded emoji by using an approximate limit for each year and a
strict process.

3) The Unicode Consortium is somewhat motivated to encode new emoji
because of the publicity surrounding them. That publicity might
subside sooner or later. It's difficult to imagine the same kind
of publicity for italics and friends.

> Most of us involved with Unicode probably expect plain-text to be around 
> for quite a while.  The figure bandied about in the past on this list is 
> "a thousand years".  Only a society of mindless drones would cling to 
> the past for a millennium.  So, many of us probably figure that 
> strictures laid down now will be overridden as a matter of course, over 
> time.
> 
> Unicode will probably be around for awhile, but the barrier between 
> plain- and rich-text has already morphed significantly in the relatively 
> short period of time it's been around.

Because whatever is encoded can't be "unencoded", it's clear that we can 
only move in one direction, and not back. But because we want Unicode to 
work for a long, long time, it's very important to be conservative.

> I became attracted to Unicode about twenty years ago.  Because Unicode 
> opened up entire /realms/ of new vistas relating to what could be done 
> with computer plain text.  I hope this trend continues.

I hope this trend only continues very slowly, if at all.

Regards,Martin.