Re: A last missing link for interoperable representation
Julian Bradfield wrote, > I have never seen a Unicode math alphabet character in email > outside this list. It's being done though. Check this message from 2013 which includes the following, copy/pasted from the web page into Notepad: 혗혈혙혛 혖혍 헔햳햮헭.향햱햠햬햤햶햮햱햪 © ퟮퟬퟭퟯ 햠햫햤햷 햦햱햠햸 헀헂헍헁헎햻.햼허헆/헺헿헮헹헲혅헴헿헮혆 https://apple.stackexchange.com/questions/104159/what-are-these-characters-and-how-can-i-use-them
Re: A last missing link for interoperable representation
On 2019-01-13, James Kass via Unicode wrote: > यदि आप किसी रोटरी फोन से कॉल कर रहे हैं, तो कृपया स्टार (*) दबाएं। > What happens with Devanagari text? Should the user community refrain > from interchanging data because 1980s era software isn't Unicode aware? Devanagari is an established writing system (which also doesn't need separate letters for different typefaces). Those who wish to exchange information in devanagari will use either an ISCII or Unicode system with suitable font support. Just as those who wish to exchange English text with typographic detail will use a suitable typographic mark-up system with font support, which will typically not interfere with plain text searching. Even in a PDF document, "art nouveau" will appear as "art nouveau" whatever font it's in. Incidentally, a large chunk of my facebook feed is Indian politics, and of that portion of it that is in Hindi or other Indian languages, most is still written in ASCII transcription, even though every web browser and social media application in common use surely has full Unicode support these days. Sometimes using your own writing system is just too much effort! -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: A last missing link for interoperable representation
On 2019-01-14, James Kass via Unicode wrote: > 퐴푟푡 푛표푢푣푒푎푢 seems a bit 푝푎푠푠é nowadays, as well. > > (Had to use mark-up for that “span” of a single letter in order to > indicate the proper letter form. But the plain-text display looks crazy > with that HTML jive in it.) Indeed. But _Art nouveau_ seems a bit _passé_ nowadays looks fine and is understood even by those who have never annotated a manuscript with proof corrections. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: A last missing link for interoperable representation
On 2019-01-13, Marcel Schneider via Unicode wrote: > As far as the information goes that was running until now on this List, > Mathematicians are both using TeX and liking the Unicode math alphabets. As Khaled has said, if they use them, it's because some software designer has decided to use them to implement markup. I have never seen a Unicode math alphabet character in email outside this list. > These statements make me fear that the font you are using might unsupport > the NARROW NO-BREAK SPACE U+202F > <. If you see a question mark between It displays as a space. As one would expect - I use fixed width fonts for plain text. > these pointy brackets, please let us know. Because then, You’re unable to > read interoperably usable French text, too, as you’ll see double punctuation > (eg "?!") where a single mark is intended, like here ! I see "like here !". French text does not need narrow spacing any more than science does. When doing typography, fifty centimetres is $50\thinspace\mathrm{cm}$; in plain text, 50cm does just fine. Likewise, normal French people writing email write "Quel idiot!", or sometimes "Quel idiot !". If you google that phrase on a few French websites, you'll see that some (such as Larousse, whom one might expect to care about such things) use no space before punctuation, while others (such as some random T-shirt company) use an ASCII space. The Académie Française, which by definition knows more about French orthography than you do, uses full ASCII spaces before ? and ! on its front page. Also after opening guillemets, which looks even more stupid from an Anglophone perspective. > Aiming at extending the subset of environments supporting correct typesetting There are many fine programs, including TeX, for doing good typesetting. Unicode is not about typesetting, it's about information exchange and preservation. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: A last missing link for interoperable representation
Martin J. Dürst wrote, > I'd say it should be conservative. As the meaning of that word > (similar to others such as progressive and regressive) may be > interpreted in various way, here's what I mean by that. > > It should not take up and extend every little fad at the blink of an > eye. It should wait to see what the real needs are, and what may be > just a temporary fad. As the Mathematical style variants show, once > characters are encoded, it's difficult to get people off using them, > even in ways not intended. A conservative approach to progress is a sensible position for computer character encoders. Taking a conservative approach doesn't necessarily mean being anti-progress. Trying to "get people off" using already encoded characters, whether or not the encoded characters are used as intended, might give an impression of being anti-progress. Unicode doesn't enforce any spelling or punctuation rules. Unicode doesn't tell human beings how to pronounce strings of text or how to interpret them. Unicode doesn't push any rules about splitting infinitives or conjugating verbs. Unicode should not tell people how any written symbol must be interpreted. Unicode should not tell people how or where to deploy their own written symbols. Perhaps fraktur is frivolous in English text. Perhaps its use would result in a new convention for written English which would enhance the literary experience. Italics conventions which have only been around a hundred years or so may well turn out to be just a passing fad, so we should probably give it a bit more time. Telling people they mustn't use Latin italics letter forms in computer text while we wait to see if the practice catches on seems flawed in concept.
RE: A last missing link for interoperable representation
"Looking back at the history of computing, a large chunk of the underlying technology has hit stability. ARM chips, x86 chips, Unix, and Windows have all been around since 1985 or before, roughly 35 years ago and 35 years since the first programmed computer. They aren't wildly changing." I would encourage you to return to a system of 35 years ago, if you believe they are the same. Performance, pipeline, memory access, device support, graphical capabilities, underlying instructions, security features... One could argue the wheel is medieval and still works today, but the wheels I drive on are designed for a variety of weather conditions, traction, minimal noise generation, light weight with durability and high performance, and are particular to the front or back axle. And I know from experience the wrong wheels can spin me around and ram me into a median... tex
RE: A last missing link for interoperable representation
> But even most adults won't know the rules for what to italicize that > have been brought up in this thread. Even if they have read books that > use italic and bold in ways that have been brought up in this thread, > most readers won't be able to tell you what the rules are. That's left > to copy editors and similar specialist jobs. Most adults don't know the right places to soft-hyphenate a word, and yet we support that in plain-text. They also don't know the differences between the various dashes and spaces and when to use each. Literacy isn't an appropriate criteria. Even the apostrophe fails that test since so many people fail to distinguish its from it's and there from they're. :-) > There was a time when computers (and printers in particular) were > single-case. There was some discussion about having to abolish case > distinctions to adapt to computers, but fortunately, that wasn't necessary. Ironic to mention the example of the failure of technology to support linguistic requirements driving a proposal to limit the attributes of language. As you say it was fortunate it wasn't necessary then... It makes the case for the importance of improving technology to support fundamental language attributes. tex
Re: A last missing link for interoperable representation
Marcel Schneider wrote, > There is a crazy typeface out there, misleadingly called 'Courier New', > as if the foundry didn’t anticipate that at some point it would be better > called "Courier Obsolete". ... 퐴푟푡 푛표푢푣푒푎푢 seems a bit 푝푎푠푠é nowadays, as well. (Had to use mark-up for that “span” of a single letter in order to indicate the proper letter form. But the plain-text display looks crazy with that HTML jive in it.)
Re: A last missing link for interoperable representation
On Sun, Jan 13, 2019 at 7:03 PM Martin J. Dürst via Unicode wrote: > No, the casing idea isn't actually a dumb one. As Asmus has shown, one > of the best ways to understand what Unicode does with respect to text > variants is that style works on spans of characters (words,...), and is > rich text, but thinks that work on single characters are handled in > plain text. Upper-case is definitely for most part a single-character > phenomenon (the recent Georgian MTAVRULI additions being the exception). I would disagree; upper case is normally used in all caps or title-case, and the latter is used on a word, not a character. I don't argue that Unicode is wrong for handling casing the way it does, but it does massively complicate the processing of any Latin text; virtually all searches should be case-insensitive, for example. At least in English, computerized casing will always be problematic. > UPPER CASE can be used on whole spans of text, but that's not the main > use case. And if UPPER CASE is used for emphasis, one way to do it (and > the best way if this is actually a styling issue) is to use rich text > and mark it up according to semantics, and then use some styling > directive (e.g. CSS text-transform: uppercase) to get the desired look. That's an example of how having multiple systems makes things more complex and less consistent. If something can be written as all upper case with the caps lock key, it will be. If a generated HTML file can have uppercase added with a Python or SQL function, it probably will be. Using CSS text-transform may be best practice, but simpler plain text solutions will be used in a lot of cases and nothing can be extrapolated clearly from its use or lack of use. -- Kie ekzistas vivo, ekzistas espero.
Re: A last missing link for interoperable representation
On Sat, Jan 12, 2019 at 8:26 PM James Kass via Unicode wrote: > It's subjective, really. It depends on how one views plain-text and > one's expectations for its future. Should plain-text be progressive, > regressive, or stagnant? Because those are really the only choices. > And opinions differ. > > Most of us involved with Unicode probably expect plain-text to be around > for quite a while. The figure bandied about in the past on this list is > "a thousand years". Only a society of mindless drones would cling to > the past for a millennium. So, many of us probably figure that > strictures laid down now will be overridden as a matter of course, over > time. And yet you write this in the Latin script that's been around for a couple millennia. Arabic, Han ideographs, Cyrillic and Devanagari have all been around a millennia. Looking back at the history of computing, a large chunk of the underlying technology has hit stability. ARM chips, x86 chips, Unix, and Windows have all been around since 1985 or before, roughly 35 years ago and 35 years since the first programmed computer. They aren't wildly changing. Unicode is moving towards that position; it does a job and doesn't need disrupt changes to continue to be relevant. > Unicode will probably be around for awhile, but the barrier between > plain- and rich-text has already morphed significantly in the relatively > short period of time it's been around. Fixed pictures have been parts of character sets for decades and were part of Unicode 1.1. U+2704, WHITE SCISSORS, for example. And emoji aren't disruptive in the way that moving something that's been a part of the rich-text layer forever into the plain-text layer. > I became attracted to Unicode about twenty years ago. Because Unicode > opened up entire /realms/ of new vistas relating to what could be done > with computer plain text. I hope this trend continues. The right tool for the job. If you need rich text, you should use rich text. Emoji had to make the case that they were being used as characters and there were no competing tools to handle them. -- Kie ekzistas vivo, ekzistas espero.
Re: A last missing link for interoperable representation
On 2019/01/14 01:46, Julian Bradfield via Unicode wrote: > On 2019-01-12, Richard Wordingham via Unicode wrote: >> On Sat, 12 Jan 2019 10:57:26 + (GMT) >> And what happens when you capitalise a word for emphasis or to begin a >> sentence? Is it no longer the same word? > > Indeed. As has been observed up-thread, the casing idea is a dumb one! > We are, however, stuck with it because of legacy encoding transported > into Unicode. We aren't stuck with encoding fonts into Unicode. No, the casing idea isn't actually a dumb one. As Asmus has shown, one of the best ways to understand what Unicode does with respect to text variants is that style works on spans of characters (words,...), and is rich text, but thinks that work on single characters are handled in plain text. Upper-case is definitely for most part a single-character phenomenon (the recent Georgian MTAVRULI additions being the exception). UPPER CASE can be used on whole spans of text, but that's not the main use case. And if UPPER CASE is used for emphasis, one way to do it (and the best way if this is actually a styling issue) is to use rich text and mark it up according to semantics, and then use some styling directive (e.g. CSS text-transform: uppercase) to get the desired look. Another criterion is orthography. Schoolchildren learn when to capitalize a word and when not. Teachers check and correct it all the time. Grammar books and books for second language learners discuss capitalization, because it's part of orthography, the rules differ by language, and not getting it right will make the writer look bad. But even most adults won't know the rules for what to italicize that have been brought up in this thread. Even if they have read books that use italic and bold in ways that have been brought up in this thread, most readers won't be able to tell you what the rules are. That's left to copy editors and similar specialist jobs. There was a time when computers (and printers in particular) were single-case. There was some discussion about having to abolish case distinctions to adapt to computers, but fortunately, that wasn't necessary. Regards, Martin.
Re: A last missing link for interoperable representation
Julian Bradfield replied, >> Sounds like you didn't try it. VS characters are default ignorable. > > By software that has a full understanding of Unicode. There is a very > large world out there of software that was written before Unicode was > dreamed of, let alone popular. यदि आप किसी रोटरी फोन से कॉल कर रहे हैं, तो कृपया स्टार (*) दबाएं। What happens with Devanagari text? Should the user community refrain from interchanging data because 1980s era software isn't Unicode aware?
Re: A last missing link for interoperable representation
On Sun, Jan 13, 2019 at 04:52:25PM +, Julian Bradfield via Unicode wrote: > On 2019-01-12, James Kass via Unicode wrote: > > This is an italicized word: > > 푘푎푘푖푠푡표푐푟푎푐푦 > > ... where the "geek" hacker used Latin italics letters from the math > > alphanumeric range as though they were Latin italics letters. > > It's a sequence of question marks unless you have an up to date > Unicode font set up (which, as it happens, I don't for the terminal in > which I read this mailing list). Since actual mathematicians don't use > the Unicode math alphabets, there's no strong incentive to get updated > fonts. They do, but not necessarily by directly inputting them. LaTeX with the “unicode-math” package will translate ASCII + font switches to the respective Unicode math alphanumeric characters. Word will do the same. Even browsers rendering MathML will do the same (though most likely the MathML source will have the math alphanumeric characters already). Regards, Khaled
Re: A last missing link for interoperable representation
On 13/01/2019 17:52, Julian Bradfield via Unicode wrote: On 2019-01-12, James Kass via Unicode wrote: This is a math formula: a + b = b + a ... where the estimable "mathematician" used Latin letters from ASCII as though they were math alphanumerics variables. Yup, and it's immediately understandable by anyone reading on any computer that understands ASCII. That's why mathematicians write like that in plain text. As far as the information goes that was running until now on this List, Mathematicians are both using TeX and liking the Unicode math alphabets. This is an italicized word: 푘푎푘푖푠푡표푐푟푎푐푦 ... where the "geek" hacker used Latin italics letters from the math alphanumeric range as though they were Latin italics letters. It's a sequence of question marks unless you have an up to date Unicode font set up (which, as it happens, I don't for the terminal in which I read this mailing list). Since actual mathematicians don't use the Unicode math alphabets, there's no strong incentive to get updated fonts. These statements make me fear that the font you are using might unsupport the NARROW NO-BREAK SPACE U+202F > <. If you see a question mark between these pointy brackets, please let us know. Because then, You’re unable to read interoperably usable French text, too, as you’ll see double punctuation (eg "?!") where a single mark is intended, like here ! There is a crazy typeface out there, misleadingly called 'Courier New', as if the foundry didn’t anticipate that at some point it would be better called "Courier Obsolete". Or they did, but… (Referring to CLDR ticket #11423.) BTW if anybody knows a version of Courier New updated to a decent level of Unicode support, please be so kind and share the link so I can spread the word. Where's the harm? You lose your audience for no reasons other than technogeekery. Aiming at extending the subset of environments supporting correct typesetting is no geekery but awareness of our cultural heritage that we’re committed to maintain and to develop, taking it over into the digital world while adapting technology to culture, not conversely. Best regards, Marcel
Re: A last missing link for interoperable representation
On 2019-01-12, James Kass via Unicode wrote: > This is a math formula: > a + b = b + a > ... where the estimable "mathematician" used Latin letters from ASCII as > though they were math alphanumerics variables. Yup, and it's immediately understandable by anyone reading on any computer that understands ASCII. That's why mathematicians write like that in plain text. > This is an italicized word: > 푘푎푘푖푠푡표푐푟푎푐푦 > ... where the "geek" hacker used Latin italics letters from the math > alphanumeric range as though they were Latin italics letters. It's a sequence of question marks unless you have an up to date Unicode font set up (which, as it happens, I don't for the terminal in which I read this mailing list). Since actual mathematicians don't use the Unicode math alphabets, there's no strong incentive to get updated fonts. > Where's the harm? You lose your audience for no reasons other than technogeekery. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: A last missing link for interoperable representation
On 2019-01-12, Richard Wordingham via Unicode wrote: > On Sat, 12 Jan 2019 10:57:26 + (GMT) > Julian Bradfield via Unicode wrote: > >> It's also fundamentally misguided. When I _italicize_ a word, I am >> writing a word composed of (plain old) letters, and then styling the >> word; I am not composing a new and different word ("_italicize_") that >> is distinct from the old word ("italicize") by virtue of being made up >> of different letters. > > And what happens when you capitalise a word for emphasis or to begin a > sentence? Is it no longer the same word? Indeed. As has been observed up-thread, the casing idea is a dumb one! We are, however, stuck with it because of legacy encoding transported into Unicode. We aren't stuck with encoding fonts into Unicode. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: A last missing link for interoperable representation
On 2019-01-12, James Kass via Unicode wrote: > Sounds like you didn't try it. VS characters are default ignorable. By software that has a full understanding of Unicode. There is a very large world out there of software that was written before Unicode was dreamed of, let alone popular. > apricot > a︁p︁r︁i︁c︁o︁t︁ > Notepad finds them both if you type the word "apricot" into the search box. What has Notepad to do with me? > "But for plain text, it's crazy." > > Are you a member of the plain-text user community? Certainly:) -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: A last missing link for interoperable representation
On 2019/01/13 13:24, James Kass via Unicode wrote: > > Mark E. Shoulson wrote, > > > This discussion has been very interesting, really. I've heard what I > > thought were very good points and relevant arguments from both/all > > sides, and I confess to not being sure which I actually prefer. > > It's subjective, really. It depends on how one views plain-text and > one's expectations for its future. Should plain-text be progressive, > regressive, or stagnant? Because those are really the only choices. And > opinions differ. I'd say it should be conservative. As the meaning of that word (similar to others such as progressive and regressive) may be interpreted in various way, here's what I mean by that. It should not take up and extend every little fad at the blink of an eye. It should wait to see what the real needs are, and what may be just a temporary fad. As the Mathematical style variants show, once characters are encoded, it's difficult to get people off using them, even in ways not intended. Emoji have often been often cited in this thread. But there are some important observations: 1) Emoji were added to Unicode only after it turned out that they were widely used in Japanese character encodings, and dripping into Unicode-based systems in large numbers but without any clearly assigned code points. The Unicode Consortium didn't start encoding them because they thought emoji were cute or progressive or anything like that. 2) The Unicode Consortium is continuing to hold down the number of newly encoded emoji by using an approximate limit for each year and a strict process. 3) The Unicode Consortium is somewhat motivated to encode new emoji because of the publicity surrounding them. That publicity might subside sooner or later. It's difficult to imagine the same kind of publicity for italics and friends. > Most of us involved with Unicode probably expect plain-text to be around > for quite a while. The figure bandied about in the past on this list is > "a thousand years". Only a society of mindless drones would cling to > the past for a millennium. So, many of us probably figure that > strictures laid down now will be overridden as a matter of course, over > time. > > Unicode will probably be around for awhile, but the barrier between > plain- and rich-text has already morphed significantly in the relatively > short period of time it's been around. Because whatever is encoded can't be "unencoded", it's clear that we can only move in one direction, and not back. But because we want Unicode to work for a long, long time, it's very important to be conservative. > I became attracted to Unicode about twenty years ago. Because Unicode > opened up entire /realms/ of new vistas relating to what could be done > with computer plain text. I hope this trend continues. I hope this trend only continues very slowly, if at all. Regards,Martin.