Re: A last missing link for interoperable representation
On 15/01/2019 01:17, Asmus Freytag via Unicode wrote: On 1/14/2019 2:08 PM, Tex via Unicode wrote: Asmus, I agree 100%. Asking where is the harm was an actual question intended to surface problems. It wasn’t rhetoric for saying there is no harm. The harm comes when this is imported into rich text environments (like this e-mail inbox). Here, the math abuse and the styled text run may look the same, but I cannot search for things based on what I see. I see an English or French word, type it in the search box and it won't be found. I call that 'stealth' text. The answer is not necessarily in folding the two, because one of the reasons for having math alphabetics is so you can search for a variable "a" of certain kind without getting hits on every "a" in the text. Destroying that functionality in an attempt to "solve" the problems created by the alternate facsimile of styled text is also "harm" in some way. That may end up in a feature request for webmails and e-mail clients, where the user should be given the ability to toggle between what I’d call a “Bing search mode” and a “Google search mode.” Google Search has extended equivalence classes that enable it to handle math alphabets like plain ASCII runs, i.e. we may type a search in ASCII and Google finds instances where the text is typeset “abusing” math alphabets. On the other hand, Bing Search does not have such extended equivalence classes, and brings up variables as they are styled when searching correspondingly. I won’t blame Google of doing “harm”, and I’d like to position rather on Google’s side as it seems to meet the expectations of a larger part of end-user communities. I won’t blame Microsoft neither, I’m just noting a dividing line between the two vendors about handling math alphabets. Best regards, Marcel
Re: A last missing link for interoperable representation
On Mon, Jan 14, 2019 at 5:58 PM Mark E. Shoulson via Unicode wrote: > *If* the VS is ignored by searches, as apparently it should be and some > have reported that it is, then VS-type solutions would NOT be a problem > when it comes to searches Who is using VS-type solutions? I could not enter except for manually using some sort of \u notations. Languages that need special input support can easily adapt to unusual rules, but English Unicode is weirdly hard to enter, because the QWERTY keyboard is ubiquitous and standard. Smart quotes, non-HYPHEN-MINUS hyphens and dashes, and accents generally need memorizing of obscure entry methods or resort to a character list. Without great support from vendors, a new Unicode italic system only going to be used by the same people who currently use mathematical italics. > (and don't go whining about legacy software. > If Unicode had to be backward-compatible with everything we wouldn't > have gone beyond ASCII). Then where's this plain text that absolutely needs italics? Those legacy software systems are the place where unadorned plain text still lives. Anything on the Web is inherently dealing with rich text. -- Kie ekzistas vivo, ekzistas espero.
Re: A last missing link for interoperable representation
On 2019/01/15 07:58, David Starner via Unicode wrote: > On Mon, Jan 14, 2019 at 2:09 AM Tex via Unicode wrote: >> ·Plain text still has tremendous utility and rich text is not always >> an option. > > Where? Twitter has the option of doing rich text, as does any closed > system. In fact, Twitter is rich text, in that it hyperlinks web > addresses. That Twitter has chosen not to support italics is a choice. > If users don't like this, they could go another system, or use > third-party tools to transmit rich text over Twitter. The use of > underscores or markings for italics would be mostly > compatible with human twitterers using the normal interface. Yes indeed. Some similar services allow styling. One example is Slack, see e.g. https://get.slack.help/hc/en-us/articles/202288908-Format-your-messages. Markdown has been mentioned as an example of how some basic styling options (bold, italic,...) can be implemented. Another choice is using an user interface component (menu,...). The user then doesn't have to care about any 'weird' conventions, even the simplest ones, nor about what happens in the background (most probably HTML), and already is familiar with it from other applications. As for implementation complexity, it's not trivial, but there are quite a lot of components available, in particular for Web technology. It's not rocket science. Actually, in some cases, it is even difficult to get rid of styling on the Web. I recently wanted to print out a map of how to get to a restaurant for a party. The restaurant's Web site was all black background. I copied the address to Google Maps and then tried to print it. Google Maps insists that the first page is just information about the location, so I copied the name of the restaurant from the Web page. What happened was that it still had the black background. So copy-paste on your average Web browser these days doesn't lose styles, even in cases where that would be desirable (because more legible). So rich text technology is already way ahead when it comes to styled text. Do we want to encode background-color variant selectors in Unicode? If yes, how many? [Hint: The last two questions are rhetorical.] Regards, Martin.
Re: A last missing link for interoperable representation
On 2019/01/15 10:48, Mark E. Shoulson via Unicode wrote: > On 1/14/19 4:21 PM, Asmus Freytag via Unicode wrote: >> Short of that, I'm extremely leery of "leading" standardization; that >> is, encoding things that "might" be used. >> > It is certainly true that Unicode should not be (and wasn't, before > emoji) Just to be precise, as already has been mentioned in this thread, the first batch of 'emoji' was in Unicode from the start (e.g. U+2603 SNOWMAN, there since Unicode 1.1), I think from Zapf Dingbats. The second batch came from Japanese phones. So for the first two batches of emoji, Unicode did not do any "leading" standardization. It was only after that, for later batches, where that happened. > in the business of encoding things that "could be used", but > rather, was for encoding things that *were* used. This, naturally, > poses a chicken-and-egg problem which has been complained about by > several people in the past (including me). Still, there are ways to > show that things that haven't been encoded are still being "used", as > people make shift to do what they can to use the script/notation, like > using PUA or characters that aren't QUITE right, but close... And in > fairness, I'd have to say that the use of mathematical italics would > count in that regard. It's hard to dispute that there is a demand for > it, just by looking at how people have been trying to do it! "a demand" doesn't quantify the demand at all. My guess is that given the overall volume of Twitter or Facebook communication, the percentage of Math italics (ab)use is really, really low. It's impossible to say that there's no demand, but use cases like "look, I found these characters, aren't they cute" in some corners of some social services is not the same as "we urgently need this, otherwise we can't communicate in our language". Regards,Martin.
Re: A last missing link for interoperable representation
(sorry for multiple responses...) On 1/13/19 10:00 PM, Martin J. Dürst via Unicode wrote: On 2019/01/14 01:46, Julian Bradfield via Unicode wrote: On 2019-01-12, Richard Wordingham via Unicode wrote: On Sat, 12 Jan 2019 10:57:26 + (GMT) And what happens when you capitalise a word for emphasis or to begin a sentence? Is it no longer the same word? Indeed. As has been observed up-thread, the casing idea is a dumb one! We are, however, stuck with it because of legacy encoding transported into Unicode. We aren't stuck with encoding fonts into Unicode. No, the casing idea isn't actually a dumb one. As Asmus has shown, one of the best ways to understand what Unicode does with respect to text variants is that style works on spans of characters (words,...), and is rich text, but thinks that work on single characters are handled in plain text. Upper-case is definitely for most part a single-character phenomenon (the recent Georgian MTAVRULI additions being the exception). Not just an exception, but an exception that proves the rule. It's precisely because plain-text distinctions, generally speaking, should be at the letter level as Asmus says that there was so much shouting about MTAVRULI. That these are exceptional demonstrates the existence of the rule. But even most adults won't know the rules for what to italicize that have been brought up in this thread. Even if they have read books that use italic and bold in ways that have been brought up in this thread, most readers won't be able to tell you what the rules are. That's left to copy editors and similar specialist jobs. I don't think there's really a case to be made that italics are or should work the same as capitals, or that they are justified for the same reasons that capitals are justified. And the use-cases show how people are using them: not necessarily for Chicago Manual of Style mandated purposes, but for emphasis of varying kinds. There was a time when computers (and printers in particular) were single-case. There was some discussion about having to abolish case distinctions to adapt to computers, but fortunately, that wasn't necessary. Abolishing case I could see as a hassle, and we have become somewhat dependent on it for other things. But it was a bad idea to start with. ~mark
Re: A last missing link for interoperable representation
On 1/13/19 10:00 PM, Martin J. Dürst via Unicode wrote: On 2019/01/14 01:46, Julian Bradfield via Unicode wrote: On 2019-01-12, Richard Wordingham via Unicode wrote: On Sat, 12 Jan 2019 10:57:26 + (GMT) And what happens when you capitalise a word for emphasis or to begin a sentence? Is it no longer the same word? Indeed. As has been observed up-thread, the casing idea is a dumb one! We are, however, stuck with it because of legacy encoding transported into Unicode. We aren't stuck with encoding fonts into Unicode. No, the casing idea isn't actually a dumb one. Well, for me, when I say or said that the "casing idea" is a dumb one, I don't mean how Unicode handled it. Unicode is quite correct in encoding capitals distinctly from lowercase, both for computer-historical reasons and others you mention. I think the idea of having case in alphabets _in the first place_ was a bad move. It's a "mistake" that happened centuries ago. ~mark
Re: A last missing link for interoperable representation
On 1/14/2019 5:41 PM, Mark E. Shoulson via Unicode wrote: On 1/14/19 5:08 AM, Tex via Unicode wrote: This thread has gone on for a bit and I question if there is any more light that can be shed. BTW, I admit to liking Asmus definition for functions that span text being a definition or criteria for rich text. Me too. There are probably some exceptions or weird corner-cases, but it seems to be a really good encapsulation of the distinction which I had never seen before. ** blush ** A./
Re: A last missing link for interoperable representation
In some of this discussion, I'm not sure what is being proposed or forbidden here... I don't know that anyone is advocating removing the "don't use these for words!" warning sticker on the mathematical italics. The closest-to-sensible suggestions I've heard are things like a VS to italicize a letter, a combining italicizer so to speak (this is actually very similar to the emoji-style vs text-style VS sequences). *If* the VS is ignored by searches, as apparently it should be and some have reported that it is, then VS-type solutions would NOT be a problem when it comes to searches (and don't go whining about legacy software. If Unicode had to be backward-compatible with everything we wouldn't have gone beyond ASCII). So I'm not sure what you mean when you speak of "Unicode italics". Do you mean using the mathematical italics as we've been seeing? Or having a whole new plane of italic characters for everything that could conceivably be italicized? Those would probably both be mistakes, I agree. ~mark On 1/14/19 5:58 PM, David Starner via Unicode wrote: On Mon, Jan 14, 2019 at 2:09 AM Tex via Unicode wrote: The arguments against italics seem to be: ·Unicode is plain text. Italics is rich text. ·We haven't had it until now, so we don't need it. ·There are many rich text solutions, such as html. ·There are ways to indicate or simulate italics in plain text including using underscore or other characters, using characters that look italic (eg math), etc. ·Adding Italicization might break existing software ·The examples of existing Unicode characters that seem to represent rich text (emoji, interlinear annotation, et al) have justifications. There generally shouldn't be multiple ways of doing things. For example, if you think that searching for certain text in italics is important, then having both HTML italics and Unicode italics are going to cause searches to fail or succeed unexpectedly, unless the underlying software unifies the two systems (an extra complexity). Searching for certain italicized text could be done today in rich text applications, were there actual demand for it. ·Plain text still has tremendous utility and rich text is not always an option. Where? Twitter has the option of doing rich text, as does any closed system. In fact, Twitter is rich text, in that it hyperlinks web addresses. That Twitter has chosen not to support italics is a choice. If users don't like this, they could go another system, or use third-party tools to transmit rich text over Twitter. The use of underscores or markings for italics would be mostly compatible with human twitterers using the normal interface. Source code is an example of plain text, and yet adding italics into comments would require but a trivial change to editors. If the user audience cared, it would have been done. In fact, I suspect there exist editors and environments where an HTML subset is put into comments and rendered by the editors; certainly active links would be more useful in source code comments than italics. Lastly, the places where I still find massive use of plain text are the places this would hurt the most. GNU Grep's manpage shows no sign that it supports searching under any form of Unicode normalization. Same with GNU Less. Adding italics would just make searching plain text documents more complex for their users. The domain name system would just add them to the ban list, and they'd be used for spoofing in filenames and other less controlled but still sensitive environments.
Re: A last missing link for interoperable representation
On 1/14/19 4:21 PM, Asmus Freytag via Unicode wrote: On 1/14/2019 2:08 AM, Tex via Unicode wrote: Perhaps the question should be put to twitter, messaging apps, text-to-voice vendors, and others whether it will be useful or not. If the discussion continues I would like to see more of a cost/benefit analysis. Where is the harm? What will the benefit to user communities be? The "it does no harm" is never an argument "for" making a change. It's something of a necessary, but not a sufficient condition, in other words. More to the point, if there were platforms (like social media) that felt an urgent need to support styling without a markup language, and could articulate that need in terms of a proposal, then we would have something to discuss. (We might engage them in a discussion of the advisability of supporting "markdown", for example). Short of that, I'm extremely leery of "leading" standardization; that is, encoding things that "might" be used. It is certainly true that Unicode should not be (and wasn't, before emoji) in the business of encoding things that "could be used", but rather, was for encoding things that *were* used. This, naturally, poses a chicken-and-egg problem which has been complained about by several people in the past (including me). Still, there are ways to show that things that haven't been encoded are still being "used", as people make shift to do what they can to use the script/notation, like using PUA or characters that aren't QUITE right, but close... And in fairness, I'd have to say that the use of mathematical italics would count in that regard. It's hard to dispute that there is a demand for it, just by looking at how people have been trying to do it! So I'm starting to think this is not really "leading" standardization, but rather following up and, well, standardizing it, replacing ad-hoc attempts with a standard way to do things, just as Unicode is supposed to do. ~mark As for the abuse of math alphabetics. That's happening whether we like it or not, but at this point represents playful experimentation by the exuberant fringe of Unicode users and certainly doesn't need any additional extensions.
Re: A last missing link for interoperable representation
On 1/14/19 5:08 AM, Tex via Unicode wrote: This thread has gone on for a bit and I question if there is any more light that can be shed. BTW, I admit to liking Asmus definition for functions that span text being a definition or criteria for rich text. Me too. There are probably some exceptions or weird corner-cases, but it seems to be a really good encapsulation of the distinction which I had never seen before. ~mark
Re: A last missing link for interoperable representation
On 1/14/19 4:45 AM, Martin J. Dürst via Unicode wrote: Hello James, others, From the examples below, it looks like a feature request for Twitter (and/or Facebook). Blaming the problem on Unicode doesn't seem to be appropriate. I think what people here are doing is not blaming the problem on Unicode, but rather blaming the _solution_ on Unicode, for better or worse. ~mark
Re: A last missing link for interoperable representation
On Mon, 14 Jan 2019 16:02:05 -0800 Asmus Freytag via Unicode wrote: > On 1/14/2019 3:37 PM, Richard Wordingham via Unicode wrote: > On Tue, 15 Jan 2019 00:02:49 +0100 > Hans Åberg via Unicode wrote: > > On 14 Jan 2019, at 23:43, James Kass via Unicode > wrote: > > Hans Åberg wrote, > > How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́ > > Thought about using a combining accent. Figured it would just > display with a dotted circle but neglected to try it out first. It > actually renders perfectly here. /That's/ good to know. (smile) > > It is a bit off here. One can try math, too: the derivative of 훾(푡) > is 훾̇(푡). > > No it isn't. You should be using a spacing character for > differentiation. > > Sorry, but there may be different conventions. The dot / double-dot > above is definitely common usage in physics. > > A./ Apologies. It was positioned in the parenthesis, and it looked like a misplaced U+0301. Richard.
Re: A last missing link for interoperable representation
On Mon, 14 Jan 2019 06:24:46 + James Kass via Unicode wrote: > Unicode doesn't enforce any spelling or punctuation rules. Unicode > doesn't tell human beings how to pronounce strings of text or how to > interpret them. These are not statements that are both honest and true. Unicode lays down rules and recommendations which others may then enforce. In Indic scripts where LETTER A is not also a consonant, Unicode forbids writing where LETTER AA would do the same job, and most renderers enforce that rule. Similarly, in phonetically ordered LTR scripts, one can't write a dependent vowel as the first character even if it is the leftmost character. There is a subtler rule about not spelling negative numbers with a hyphen-minus - if one does, one may suddenly find a line break just after what is being used as a negative sign. In scripts where Sanskrit grv and gvr may be rendered identically, Unicode tells us what the two code sequences are, and therefore indirectly what the range of pronunciations is for a given spelling. Now, sometimes the enforcers overstep the mark. For example, the USE tells us that when we write Northern Thai /pʰiaʔ/ 'sound of a smack' which visually is , with denoting /ia/, we should write it ᨻ᩠ᨿᩕᩮᩡ . So much for phonetic order! Enforcement can be more subtle. TUS says that Farsi should use U+06CC ARABIC LETTER FARSI YEH instead of U+064A ARABIC LETTER YEH although they are identical in initial and medial positions. In this case, the enforcer will be the spell-checker. Richard.
Re: A last missing link for interoperable representation
On 1/14/2019 2:08 PM, Tex via Unicode wrote: Asmus, I agree 100%. Asking where is the harm was an actual question intended to surface problems. It wasn’t rhetoric for saying there is no harm. The harm comes when this is imported into rich text environments (like this e-mail inbox). Here, the math abuse and the styled text run may look the same, but I cannot search for things based on what I see. I see an English or French word, type it in the search box and it won't be found. I call that 'stealth' text. The answer is not necessarily in folding the two, because one of the reasons for having math alphabetics is so you can search for a variable "a" of certain kind without getting hits on every "a" in the text. Destroying that functionality in an attempt to "solve" the problems created by the alternate facsimile of styled text is also "harm" in some way. Also, it may not be obvious to social media, messaging platforms, that there is a possibility of a solution. Often when a problem exists for a long time, it fades into unconsciousness. The pain is accepted as that is the way it is and has to be. A push for (more) universal support of lowest common denominator "markdown" would go a long way to support such features in environments where SMGL style markup is infeasible and out-of-band communication not possible. It becomes part of the culture. Asking if there is a pain and whether a solution would be welcomed is consciousness raising. I agree about leading standardization. I thought some legitimate needs were raised. The questions were designed to quantify the use case as well as the potential damage. Also, treating everything as a character encoding problem is so broken. I didn’t think anyone was recommending more math abuse. I thought it was raised as an example of people resorting to them as a solution for a need. Of course they are also an example of playful experimentation. Separately, Regarding messaging platforms, although twitter is one example in the social media space, today there are many business, commercial, and other applications that embed messaging capabilities for their communities and for servicing customers. I wouldn’t dismiss the need just based on twitter’s assessment or on the idea that social media is just for casual or “fun” use. Clarity of communications can be significant for many organizations. Having the proposed capabilities in plain text rather than requiring all of the overhead of a more rich text solution could be a big win for these apps. I see the math abuse as something that is being done as an exercise of playfulness. There are other uses of characters based on what they look like, rather than what they mean (or are intended for) and much applies to those cases as well. However, that's independent from making a value judgement on social media as such just because some people use the features more creatively. That's a judgement that I have neither made nor would I be comfortable with it. A./ tex From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Asmus Freytag via Unicode Sent: Monday, January 14, 2019 1:21 PM To: unicode@unicode.org Subject: Re: A last missing link for interoperable representation On 1/14/2019 2:08 AM, Tex via Unicode wrote: Perhaps the question should be put to twitter, messaging apps, text-to-voice vendors, and others whether it will be useful or not. If the discussion continues I would like to see more of a cost/benefit analysis. Where is the harm? What will the benefit to user communities be? The "it does no harm" is never an argument "for" making a change. It's something of a necessary, but not a sufficient condition, in other words. More to the point, if there were platforms (like social media) that felt an urgent need to support styling without a
Re: A last missing link for interoperable representation
On 1/14/2019 2:43 PM, James Kass via Unicode wrote: Hans Åberg wrote, > How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́ Thought about using a combining accent. Figured it would just display with a dotted circle but neglected to try it out first. It actually renders perfectly here. /That's/ good to know. (smile) While all of this displays fine, it currently can't be found in the same search that would locate true italics. As I am seeing this in an environment that otherwise supports rich text, the result is "stealth" text. Stuff that I can read, but not process, without being able to see a difference. A./
Re: A last missing link for interoperable representation
On 1/14/2019 3:37 PM, Richard Wordingham via Unicode wrote: On Tue, 15 Jan 2019 00:02:49 +0100 Hans Åberg via Unicode wrote: On 14 Jan 2019, at 23:43, James Kass via Unicode wrote: Hans Åberg wrote, How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́ Thought about using a combining accent. Figured it would just display with a dotted circle but neglected to try it out first. It actually renders perfectly here. /That's/ good to know. (smile) It is a bit off here. One can try math, too: the derivative of 훾(푡) is 훾̇(푡). No it isn't. You should be using a spacing character for differentiation. Sorry, but there may be different conventions. The dot / double-dot above is definitely common usage in physics. A./ On the other hand, one uses a combining circumflex for Fourier transforms. Richard.
Re: A last missing link for interoperable representation
On Tue, 15 Jan 2019 00:02:49 +0100 Hans Åberg via Unicode wrote: > > On 14 Jan 2019, at 23:43, James Kass via Unicode > > wrote: > > > > Hans Åberg wrote, > > > > > How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́ > > > > Thought about using a combining accent. Figured it would just > > display with a dotted circle but neglected to try it out first. It > > actually renders perfectly here. /That's/ good to know. (smile) > > It is a bit off here. One can try math, too: the derivative of 훾(푡) > is 훾̇(푡). No it isn't. You should be using a spacing character for differentiation. On the other hand, one uses a combining circumflex for Fourier transforms. Richard.
Re: A last missing link for interoperable representation
> On 14 Jan 2019, at 23:43, James Kass via Unicode wrote: > > Hans Åberg wrote, > > > How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́ > > Thought about using a combining accent. Figured it would just display with a > dotted circle but neglected to try it out first. It actually renders > perfectly here. /That's/ good to know. (smile) It is a bit off here. One can try math, too: the derivative of 훾(푡) is 훾̇(푡).
Re: A last missing link for interoperable representation
On 1/14/2019 2:58 PM, David Starner via Unicode wrote: Source code is an example of plain text, and yet adding italics into comments would require but a trivial change to editors. If the user audience cared, it would have been done. In fact, I suspect there exist editors and environments where an HTML subset is put into comments and rendered by the editors; certainly active links would be more useful in source code comments than italics. Source Insight is a nice and powerful programming editor that supports rich-text display of source code, i.e. beyond simple syntax coloring / linkification. For example, large type for function names. They even support some styling in comments, but more along the lines of allowing their own markdown convention that let's you write headings of different levels. Both to write comments that introduce sections of your code, as well as headings and subheadings inside longer comment blocks. So stuff like that exists, but it's using semantic markup (style settings per language element) or markdown (styles in comments). A./
Re: A last missing link for interoperable representation
On Mon, Jan 14, 2019 at 2:09 AM Tex via Unicode wrote: > The arguments against italics seem to be: > > ·Unicode is plain text. Italics is rich text. > > ·We haven't had it until now, so we don't need it. > > ·There are many rich text solutions, such as html. > > ·There are ways to indicate or simulate italics in plain text > including using underscore or other characters, using characters that look > italic (eg math), etc. > > ·Adding Italicization might break existing software > > ·The examples of existing Unicode characters that seem to represent > rich text (emoji, interlinear annotation, et al) have justifications. There generally shouldn't be multiple ways of doing things. For example, if you think that searching for certain text in italics is important, then having both HTML italics and Unicode italics are going to cause searches to fail or succeed unexpectedly, unless the underlying software unifies the two systems (an extra complexity). Searching for certain italicized text could be done today in rich text applications, were there actual demand for it. > ·Plain text still has tremendous utility and rich text is not always > an option. Where? Twitter has the option of doing rich text, as does any closed system. In fact, Twitter is rich text, in that it hyperlinks web addresses. That Twitter has chosen not to support italics is a choice. If users don't like this, they could go another system, or use third-party tools to transmit rich text over Twitter. The use of underscores or markings for italics would be mostly compatible with human twitterers using the normal interface. Source code is an example of plain text, and yet adding italics into comments would require but a trivial change to editors. If the user audience cared, it would have been done. In fact, I suspect there exist editors and environments where an HTML subset is put into comments and rendered by the editors; certainly active links would be more useful in source code comments than italics. Lastly, the places where I still find massive use of plain text are the places this would hurt the most. GNU Grep's manpage shows no sign that it supports searching under any form of Unicode normalization. Same with GNU Less. Adding italics would just make searching plain text documents more complex for their users. The domain name system would just add them to the ban list, and they'd be used for spoofing in filenames and other less controlled but still sensitive environments. -- Kie ekzistas vivo, ekzistas espero.
Re: A last missing link for interoperable representation
Hans Åberg wrote, > How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́ Thought about using a combining accent. Figured it would just display with a dotted circle but neglected to try it out first. It actually renders perfectly here. /That's/ good to know. (smile)
Re: A last missing link for interoperable representation
On 14/01/2019 08:26, Julian Bradfield via Unicode wrote: On 2019-01-13, Marcel Schneider via Unicode wrote: […] These statements make me fear that the font you are using might unsupport the NARROW NO-BREAK SPACE U+202F > <. If you see a question mark between It displays as a space. As one would expect - I use fixed width fonts for plain text. It’s mainly that I suspected you could be using Courier New in the terminal. It’s default for plain text in main browsers, and there are devices whose copy of Courier New shows a .notdef box for U+202F. That’s at least what I ɥnderstood from the feedback, and a test in my browser looked likewise. these pointy brackets, please let us know. Because then, You’re unable to read interoperably usable French text, too, as you’ll see double punctuation (eg "?!") where a single mark is intended, like here ! I see "like here !". That’s fine, your font has support for . Thanks for reporting. The reason why I’m anxious to see that checked is that the impact on implementations of as the group separator is being assessed. French text does not need narrow spacing any more than science does. When doing typography, fifty centimetres is $50\thinspace\mathrm{cm}$; in plain text, 50cm does just fine. By “plain text” you probably mean *draft style*. I’m thinking that because "$50\thinspace\mathrm{cm}$" is not less plain text than "50cm". Indeed, in not understanding that sooner I was an idiot, naively believing that all Unicode List Members are using Unicode terminology. Turns out that that cannot be taken for granted any more than knowing the preferences of French people as of French text display, while not being a Frenchman: 1. Most French people prefer that big punctunation be spaced off from the word it pertains to. 2. Most French people strongly dislike punctuation cut off by a line break, but cannot fix it because: a) the ordinary keyboard layout has no non-breaking spaces; b) the readily available on peculiar keyboard layouts is bugging in most e-mail composers, ending up as breakable. 3. A significant part of French people strongly dislike angle quotes that are spaced off too far, as it happens when using . Likewise, normal French people writing email write "Quel idiot!", or sometimes "Quel idiot !". Normal people using normal keyboard layouts are writing with the readily available characters most of the time. This is why (to pick another example) French people abbreviate “numéro” to "n°", while on a British English or an American English keyboard layout we can’t normally expect anything else than "no", or "#" for “Number.” We’re not trying to keep people off writing fast and draft style. What in the Unicode era every locale is expected to achieve is to enable normal users to get the accurate interoperable representation of their language while typing fast, as opposed to coding in TeX, which is like using InDesign with system spaces instead of Unicode. System spaces are not interoperable, nor is LaTeX \thinspace if that is non-breakable in LaTeX, which it obviously is, since it is used to represent the thin space between a number and a measurement unit. In Unicode, as we know it, U+2009 THIN SPACE is breakable, and the worst thing here is that its duplicate encoding U+2008 PUNCTUATION SPACE is breakable too, instead of being non-breakable like U+2007 FIGURE SPACE. That is why there was a need to add U+202F NARROW NO-BREAK SPACE later. (More details in the cited CLDR ticket.) If you google that phrase on a few French websites, you'll see that some (such as Larousse, whom one might expect to care about such things) use no space before punctuation, Thanks for catching, that flaw shall be reported with link to your email. You may also wish to look up this page: https://communaute.lerobert.com/forum/LE-ROBERT-CORRECTEUR/LE-ROBERT-CORRECTEUR-CORRECTION-D-ORTHOGRAPHE-DICTIONNAIRES-ET-GUIDES/Espace-entre-le-meotet-le-point-d-interrogation/2918628/398261 reading: “Le logiciel Le Robert correcteur justement signale les espaces fines insécables si elles ne sont pas présentes sur le texte et propose la correction.” (“Le Robert spellchecker does report the lack of narrow no-break spaces and proposes to fix it.”) while others (such as some random T-shirt company) use an ASCII space. The Académie Française, which by definition knows more about French orthography than you do, uses full ASCII spaces before ? and ! on its front page. Also after opening guillemets, which looks even more stupid from an Anglophone perspective. (See point 3 above.) That is a very good point. Indeed this website is reasonably expected to be an example and a template of correctly typesetting a French website. There are several reasons why actually it is not. The main reason is that it is not the work of the A.F. itself, but of webdesigners, webmasters and content managers, who are normal people like for any other website. They just haven’t got an
Re: A last missing link for interoperable representation
On 1/14/2019 2:08 AM, Tex via Unicode wrote: Perhaps the question should be put to twitter, messaging apps, text-to-voice vendors, and others whether it will be useful or not. If the discussion continues I would like to see more of a cost/benefit analysis. Where is the harm? What will the benefit to user communities be? The "it does no harm" is never an argument "for" making a change. It's something of a necessary, but not a sufficient condition, in other words. More to the point, if there were platforms (like social media) that felt an urgent need to support styling without a markup language, and could articulate that need in terms of a proposal, then we would have something to discuss. (We might engage them in a discussion of the advisability of supporting "markdown", for example). Short of that, I'm extremely leery of "leading" standardization; that is, encoding things that "might" be used. As for the abuse of math alphabetics. That's happening whether we like it or not, but at this point represents playful experimentation by the exuberant fringe of Unicode users and certainly doesn't need any additional extensions.
Re: A last missing link for interoperable representation
On 14/01/2019 04:00, Martin J. Dürst via Unicode wrote: […] […] As Asmus has shown, one of the best ways to understand what Unicode does with respect to text variants is that style works on spans of characters (words,...), and is rich text, but thinks that work on single characters are handled in plain text. Upper-case is definitely for most part a single-character phenomenon (the recent Georgian MTAVRULI additions being the exception). Obviously the single-character rule also applies to superscript when used as ordinal indicator or more generally, as abbreviation indicator. Thanks for the hint, it’s all about interoperability and in this case too the point in using preformatted characters is a good one IIUC. Sorry for getting a little off-topic. There’s also one reply on my to-do list where I’ll do even more so; can’t help given it’s our digital representation that’s at stake, and due to past neglect on either side there’s still a need to painfully lobby for each character while so many other important issues are out there… Best Regards, Marcel
Re: A last missing link for interoperable representation
> On 13 Jan 2019, at 22:43, Khaled Hosny via Unicode > wrote: > > LaTeX with the > “unicode-math” package will translate ASCII + font switches to the > respective Unicode math alphanumeric characters. Word will do the same. > Even browsers rendering MathML will do the same (though most likely the > MathML source will have the math alphanumeric characters already). For full translation, one probably has to use ConTexT and LuaTeX. Then, along with PDF, one can also generate HTML with MathML.
Re: A last missing link for interoperable representation
> On 14 Jan 2019, at 06:08, James Kass via Unicode wrote: > > 퐴푟푡 푛표푢푣푒푎푢 seems a bit 푝푎푠푠é nowadays, as well. > > (Had to use mark-up for that “span” of a single letter in order to indicate > the proper letter form. But the plain-text display looks crazy with that > HTML jive in it.) How about using U+0301 COMBINING ACUTE ACCENT: 푝푎푠푠푒́
Re: A last missing link for interoperable representation
Hello Martin, others... > Blaming the problem on Unicode doesn't seem to be appropriate. I don't consider that there's any problem with plain text users exchanging plain text. I give Unicode /credit/ for being the foundation of that ability. Anyone imagining that I'm casting blame is under a misconception. There's plain text data out there stringing math alphanumerics into recognizable words. It's being stored and shared and indexed. I have no problem with that; I'm in favor of it. (Everyone, please let's focus on Tex Texin's latest post. Wish I'd sent this post before his...) Best regards, James Kass
RE: A last missing link for interoperable representation
This thread has gone on for a bit and I question if there is any more light that can be shed. BTW, I admit to liking Asmus definition for functions that span text being a definition or criteria for rich text. I also liked James examples of the twitter use case. The arguments against italics seem to be: ·Unicode is plain text. Italics is rich text. ·We haven't had it until now, so we don't need it. ·There are many rich text solutions, such as html. ·There are ways to indicate or simulate italics in plain text including using underscore or other characters, using characters that look italic (eg math), etc. ·Adding Italicization might break existing software ·The examples of existing Unicode characters that seem to represent rich text (emoji, interlinear annotation, et al) have justifications. The case for it are: ·Plain text still has tremendous utility and rich text is not always an option. ·Simulations for italics are non-standard and therefore hurt interoperability. This includes math characters not being supported universally, underscore and other indicators are not a standard, nor are alternative fonts. ·There are legitimate needs for a standardized approach for interchange, accessibility (e.g. screen readers), search, twitter, et al. ·Evidence of the demand is perhaps demonstrated by the number of simulations, and the requests for how to implement it to vendors of plain text apps (such as twitter). ·Supporting italics can be implemented without breaking existing documents and should be easily supported in modern Unicode apps. ·The impact on the standard for adding a character for italics (and another for bold and perhaps a couple others) is miniscule as it fits into the VS model. ·The argument that italics is rich text is an ideological one. However, as with other examples, there are cases where practicality should win out. ·This isn’t a slippery slope. Personally, I think the cost seems very low, both to the standard and to implementers. I don’t see a lot of risk that it will break apps. (At least not those that wouldn’t be broken by VS or other features in the standard.) It will help many apps. I think the benefits to interoperability, accessibility, search, standardization of text are significant. Perhaps the question should be put to twitter, messaging apps, text-to-voice vendors, and others whether it will be useful or not. If the discussion continues I would like to see more of a cost/benefit analysis. Where is the harm? What will the benefit to user communities be? tex
Re: A last missing link for interoperable representation
Hello James, others, On 2019/01/14 15:24, James Kass via Unicode wrote: > > Martin J. Dürst wrote, > > > I'd say it should be conservative. As the meaning of that word > > (similar to others such as progressive and regressive) may be > > interpreted in various way, here's what I mean by that. > > > > It should not take up and extend every little fad at the blink of an > > eye. It should wait to see what the real needs are, and what may be > > just a temporary fad. As the Mathematical style variants show, once > > characters are encoded, it's difficult to get people off using them, > > even in ways not intended. > > A conservative approach to progress is a sensible position for computer > character encoders. Taking a conservative approach doesn't necessarily > mean being anti-progress. > > Trying to "get people off" using already encoded characters, whether or > not the encoded characters are used as intended, might give an > impression of being anti-progress. Using the expression "get people off" was indeed somewhat ambiguous. Of course we cannot forbid people to use Mathematical alphanumerics. There's no standards police, neither for Unicode nor most other standards. > Unicode doesn't enforce any spelling or punctuation rules. Unicode > doesn't tell human beings how to pronounce strings of text or how to > interpret them. Unicode doesn't push any rules about splitting > infinitives or conjugating verbs. > > Unicode should not tell people how any written symbol must be > interpreted. Unicode should not tell people how or where to deploy > their own written symbols. Yes. But Unicode can very well say: These characters are for Math, and if you use them for anything else, that's your problem, and because they are used for Math, they support what's used in Math, and we won't add copies of accented characters or variant characters for style or [your proposal goes here] because that's not what Unicode is about. If you want real styling, then use applications that can do that, or try to convince your application provider to provide that. (Well, Unicode is more or less saying just exactly that currently.) And that's what I meant with "getting people off". If that then leads to less people (mis)using these characters, all the better. > Perhaps fraktur is frivolous in English text. Perhaps its use would > result in a new convention for written English which would enhance the > literary experience. Italics conventions which have only been around a > hundred years or so may well turn out to be just a passing fad, so we > should probably give it a bit more time. There's no need to give italic conventions more time. Of course they may die out, but they are very active now. And they are very actively supported in rich text, where they belong. > Telling people they mustn't use Latin italics letter forms in computer > text while we wait to see if the practice catches on seems flawed in > concept. The practice is already there. Lots of people use italics in rich text. That's just fine because that's the right thing to do. We don't need to muddy the waters. Regards, Martin.
Re: A last missing link for interoperable representation
Hello James, others, From the examples below, it looks like a feature request for Twitter (and/or Facebook). Blaming the problem on Unicode doesn't seem to be appropriate. Regards, Martin. On 2019/01/14 18:06, James Kass via Unicode wrote: > > Not a twitter user, don't know how popular the practice is, but here's a > couple of links concerned with how to use bold or italics in Twitter > plain text messages. > > https://www.simplehelp.net/2018/03/13/how-to-use-bold-and-italicized-text-on-twitter/ > > > https://mothereff.in/twitalics > > Both pages include a form of caveat. But the caveat isn't about the > intended use of the math alphanumerics. > > The first page includes the following text as part of a "tweet": > Just because you 헰헮헻 doesn’t mean you 혴혩혰혶혭혥 :) > > And, as before, I have no idea how /popular/ the practice is. But > here's some more links: > > (web page from 2013) > How To Write In Italics, Tweet Backwards And Use Lots Of Different ... > https://www.adweek.com/digital/twitter-font-italics-backwards/ > > (This is copy/pasted *as-is* from the web page to plain-text) > Bold and Italic Unicode Text Tool - 퐁퐨퐥퐝 풂풏풅 푖푡푎푙푖푐푠 - > YayText > https://yaytext.com/bold-italic/ > Super cool unicode text magic. Write 퐛퐨퐥퐝 and/or 푖푡푎푙푖푐 > updates on Facebook, Twitter, and elsewhere. Bold (serif) preview copy > tweet. > > Michael Maurino [emoji redacted-JK] on Twitter: "Can I make italics on > twitter? 'cause ... > https://twitter.com/iron_stylus/status/281991180064022528?lang=en > > Charlie Brooker on Twitter: "How do you do italics on this thing again?" > https://twitter.com/charltonbrooker/status/484623185862983680?lang=en > > How to make your Facebook and Twitter text bold or italic, and other ... > https://boingboing.net/2016/04/10/yaytext-unicode-text-styling.html > Apr 10, 2016 - For years I've been using the Panix Unicode Text > Converter to create ironic, weird or simply annoying text effects for > use on Twitter, Facebook ... > > How to change your Twitter font | Digital Trends > https://www.digitaltrends.com/.../now-you-can-use-bold-italics-and-other-fancy-fonts-... > > > Aug 14, 2013 - now you can use bold italics and other fancy fonts on > twitter isaac ... or phrase into your Twitter text box, and there you > have it: fancy tweets. > > Twitter Fonts Generator (퓬퓸퓹픂 퓪퓷퓭 퓹퓪퓼퓽퓮) ― LingoJam > https://lingojam.com/TwitterFonts > You might have noticed that some users on Twitter are able to change the > font ... them to seemingly make their tweet font bold, italic, or just > completely different. >
Re: A last missing link for interoperable representation
Not a twitter user, don't know how popular the practice is, but here's a couple of links concerned with how to use bold or italics in Twitter plain text messages. https://www.simplehelp.net/2018/03/13/how-to-use-bold-and-italicized-text-on-twitter/ https://mothereff.in/twitalics Both pages include a form of caveat. But the caveat isn't about the intended use of the math alphanumerics. The first page includes the following text as part of a "tweet": Just because you 헰헮헻 doesn’t mean you 혴혩혰혶혭혥 :) And, as before, I have no idea how /popular/ the practice is. But here's some more links: (web page from 2013) How To Write In Italics, Tweet Backwards And Use Lots Of Different ... https://www.adweek.com/digital/twitter-font-italics-backwards/ (This is copy/pasted *as-is* from the web page to plain-text) Bold and Italic Unicode Text Tool - 퐁퐨퐥퐝 풂풏풅 푖푡푎푙푖푐푠 - YayText https://yaytext.com/bold-italic/ Super cool unicode text magic. Write 퐛퐨퐥퐝 and/or 푖푡푎푙푖푐 updates on Facebook, Twitter, and elsewhere. Bold (serif) preview copy tweet. Michael Maurino [emoji redacted-JK] on Twitter: "Can I make italics on twitter? 'cause ... https://twitter.com/iron_stylus/status/281991180064022528?lang=en Charlie Brooker on Twitter: "How do you do italics on this thing again?" https://twitter.com/charltonbrooker/status/484623185862983680?lang=en How to make your Facebook and Twitter text bold or italic, and other ... https://boingboing.net/2016/04/10/yaytext-unicode-text-styling.html Apr 10, 2016 - For years I've been using the Panix Unicode Text Converter to create ironic, weird or simply annoying text effects for use on Twitter, Facebook ... How to change your Twitter font | Digital Trends https://www.digitaltrends.com/.../now-you-can-use-bold-italics-and-other-fancy-fonts-... Aug 14, 2013 - now you can use bold italics and other fancy fonts on twitter isaac ... or phrase into your Twitter text box, and there you have it: fancy tweets. Twitter Fonts Generator (퓬퓸퓹픂 퓪퓷퓭 퓹퓪퓼퓽퓮) ― LingoJam https://lingojam.com/TwitterFonts You might have noticed that some users on Twitter are able to change the font ... them to seemingly make their tweet font bold, italic, or just completely different.
Re: A last missing link for interoperable representation
On Mon, 14 Jan 2019 07:47:45 + (GMT) Julian Bradfield via Unicode wrote: > On 2019-01-13, James Kass via Unicode wrote: > > यदि आप किसी रोटरी फोन से कॉल कर रहे हैं, तो कृपया स्टार (*) दबाएं। > > > What happens with Devanagari text? Should the user community > > refrain from interchanging data because 1980s era software isn't > > Unicode aware? > > Devanagari is an established writing system (which also doesn't need > separate letters for different typefaces). Those who wish to exchange > information in devanagari will use either an ISCII or Unicode system > with suitable font support. Has ISCII kept abreast of additions to the encoded Devanagari script? Hindi may be an established writing system, but Vedic Sanskrit with a full details is another matter. Even with full Unicode support, having a 'suitable font' is an issue with 'plain text', even deprecated plain text. The problems are that writers of Hindi don't want to have to manually suppress ligature formation, and it doesn't help that tables of Hidi conjuncts don't express the difference between real and fake viramas. (The difference surfaces with preposed vowels.) > Just as those who wish to exchange English text with typographic > detail will use a suitable typographic mark-up system with font > support, which will typically not interfere with plain text searching. > Even in a PDF document, "art nouveau" will appear as "art nouveau" > whatever font it's in. But "art nouveau" is ASCII. Copying truly complex Indic from a PDF is still something of an adventure. > Incidentally, a large chunk of my facebook feed is Indian politics, > and of that portion of it that is in Hindi or other Indian > languages, most is still written in ASCII transcription, even though > every web browser and social media application in common use surely > has full Unicode support these days. I don't believe the USE has been added to IE 11, and certainly not on Windows 7. And I fear that of OpenType fonts, only mine widely support Tai Tham as documented on the Unicode site. (And 'widely' excludes IE 11, but not MS Edge.) A fair few Tai Tham fonts rely on being permitted to bypass the script-specific support, which the Windows stack only permits to privileged scripts. Richard.
Re: A last missing link for interoperable representation
On 2019-01-14, James Kass via Unicode wrote: > Julian Bradfield wrote, > > I have never seen a Unicode math alphabet character in email > > outside this list. > > It's being done though. Check this message from 2013 which includes the > following, copy/pasted from the web page into Notepad: > > 혗혈혙혛 혖혍 헔햳햮헭.향햱햠햬햤햶햮햱햪 © ퟮퟬퟭퟯ 햠햫햤햷 햦햱햠햸 > 헀헂헍헁헎햻.햼허헆/헺헿헮헹헲혅헴헿헮혆 > > https://apple.stackexchange.com/questions/104159/what-are-these-characters-and-how-can-i-use-them Which makes the point very nicely. They're not being *used* to do maths, they're being played with for purely decorative purposes, and moreover in a way which breaks the actual intended use as a URL. If you introduce random stuff into Unicode, people will play with it (or use it for phishing). The whole thread is, as it says, "what is this weird stuff"? -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.