Aw: Why do binary files contain text but text files don't contain binary?
From a practical point of view, text files contain text that is broken into lines. And by a long-standing tradition, line breaks are treated differently among different operating systems. Whenever one transfers a text file between operating systems, the process behing that transfer cares to convert the line breaks according to the target OS's conventions. Binary files are much simpler: They can be just transfered without converting anything, even between different operating systems. Of course, this does not mean that an executable under one OS remains being a valid exe under another OS, but there lots of non-executable binaries that are useful independent of the OS (e.g. images, sound files, video files, lots of other application files). So, for a successful file transfer one needs to know whether it is text or binary, and handle it accordingly. --Jörg Knappen Gesendet: Freitag, 21. Februar 2020 um 13:21 Uhr Von: "Costello, Roger L. via Unicode" An: "unicode@unicode.org" Betreff: Why do binary files contain text but text files don't contain binary? Hi Folks, There are binary files and there are text files. Binary files often contain portions that are text. For example, the start of Windows executable files is the text MZ. To the best of my knowledge, text files never contain binary, i.e., bytes that cannot be interpreted as characters. (Of course, text files may contain a text-encoding of binary, such as base64-encoded text.) Why the asymmetry? /Roger
Aw: Geological symbols
Hallo Thomas, Unicode delegates this (combined superscripts and subscripts) to higher level markup languages or Rich Text Editors. I don't know how widespread the use of LateX is among geologists, but notation like this is a perfect use case for LaTeX. --Jörg Knappen Gesendet: Montag, 13. Januar 2020 um 12:20 Uhr Von: "Thomas Spehs (MonMap) via Unicode" An: unicode@unicode.org Betreff: Geological symbols Hi, I would like to ask if there is any way to create geological “symbols” with Unicode such as: Q₁¹ˉ², but with the two “1”s over each other, without a space. Thanks!
Aw: Re: Re: NBSP supposed to stretch, right?
Festival season is over ... I checked it out, LaTeX does the same for the input of an explicit no break space character. --Jörg Knappen Gesendet: Sonntag, 22. Dezember 2019 um 22:54 Uhr Von: "Shriramana Sharma via Unicode" An: "Jörg Knappen" Cc: "Asmus Freytag" , "UnicoDe List" Betreff: Re: Re: NBSP supposed to stretch, right? So I was wondering whether TeX only does this to the ~ input character or the actual NBSP Unicode character too?
Aw: Re: NBSP supposed to stretch, right?
Well, in TeX and LaTeX, the no break space (indicated by the active character ~ in TeX input files) is stretchable and stretches to a normal inter-word space such that all inter-word spaces in a line are equal. But multiple no break spaces still add up to wider spaces in the output unlike usual space tokens that are collapsed to one space token. -- Jörg Knappen Gesendet: Dienstag, 17. Dezember 2019 um 17:20 Uhr Von: "Asmus Freytag via Unicode" An: unicode@unicode.org Betreff: Re: NBSP supposed to stretch, right? On 12/17/2019 2:41 AM, Shriramana Sharma via Unicode wrote: On Tue 17 Dec, 2019, 16:09 QSJN 4 UKR via Unicode, <unicode@unicode.org> wrote: Agree. By the way, it is common practice to use multiple nbsp in a row to create a larger span. In my opinion, it is wrong to replace fixed width spaces with non-breaking spaces. Quote from Microsoft Typography Character design standards: «The no-break space is not the same character as the figure space. The figure space is not a character defined in most computer system's current code pages. In some fonts this character's width has been defined as equal to the figure width. This is an incorrect usage of the character no-break space.» Sorry but I don't understand how this addresses the issue I raised. You don't? In principle it may be true that NBSP is not fixed width, but show me software that doesn't treat it that way. In HTML, NBSP isn't subject to space collapse, therefore it's the go-to space character when you need some extra spacing that doesn't disappear. I bet, in many other environments it was typically the only "other" space character, so it ended up overloaded. My hunch is that it is too late at this point to try to promulgate a "clean" implementation of NBSP, because it would effectively change untold documents retroactively. So it would be a massively breaking change. If you have a situation where you need really poor layout (wide inter-word spaces) to justify, the fact that a honorific in front of a name works more like it's part of the same word (because the NBSP doesn't stretch) would be the least of my worries. (Although, on lines where interword spaces are a reduced a bit, I can see that becoming counter-intuitive). If you only fix this in software for high-end typography, you'd still have the issue that things will behave differently if you export your (plain) text. And you would have the issue of what to do when you want fixed spaces to be non-breaking as well (is that ever needed?). A./
Aw: acute-macron hybrid?
Does it also contrast with a circumflex? Historically, circumflexes were quite flexible in their graphical representation. --Jörg Knappen Gesendet: Dienstag, 30. April 2019 um 09:45 Uhr Von: "Julian Bradfield via Unicode" An: unicode@unicode.org Betreff: acute-macron hybrid? The celebrated Bosworth-Toller dictionary of Anglo-Saxon uses a curious diacritic to mark long vowels. It may be described as a long shallow acute with a small down-tick at the right. It contrasts with an acute (quite steep in this typeface) used to mark accented short vowels. Both can be seen in the fifth line of the scan at http://lexicon.ff.cuni.cz/png/oe_bosworthtoller/b0002.png What is its appropriate Unicode representation? As a lumper, I would use a macron, but I wonder what a splitter would say.
Two more ellispis-type interpunctations: ?.. and !..
While working on a corpus of Kyrgyz language, a Turkic language written in the Cyrilic script, I encountered two ellipsis-type interpunctations, namely ?.. and !.. Note that this is not (yet) a proposal to encode them a single Unicode characters although I would definitely use such characters when available because they make the text processing tool chain much simpler and more robust. It is a survey question: Do you have encountered ?.. or !.. in other languages than Kyrgyz? --Jörg Knappen
Aw: Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)
Asmus, I know your style of humor, but to keep it straight: All known human languages, even Piraha, have pronouns for "I" and "you". --Jörg Knappen Gesendet: Montag, 20. August 2018 um 16:20 Uhr Von: "Asmus Freytag via Unicode" An: unicode@unicode.org Betreff: Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process) What about languages that don't have or don't use personal pronouns. Their speakers might find their use odd or awkward. The same for many other grammatical concepts: they work reasonably well if used by someone from a related language, or for linguists trained in general concepts, but languages differ so much in what they express explicitly that if any native speaker transcribes the features that are exposed (and not implied) in their native language it may not be what a reader used to a different language is expecting to see. A./
Aw: Re: IBM 1620 invalid character symbol
I found the character in question on p. 52, it is a picture of something handwritten, not a typeset character. "Clearly" means something different to me. --Jörg Knappen Gesendet: Dienstag, 26. September 2017 um 15:03 Uhr Von: "John W Kennedy via Unicode" <unicode@unicode.org> An: "Leo Broukhis" <l...@mailcom.com>, unicode@unicode.org Betreff: Re: IBM 1620 invalid character symbol I don’t know what your snippet is from, but the normally authoritative IBM manual, A26-5706-3, IBM 1620 CPU Model 1 (July, 1965) displays what is clearly the Cyrillic letter. Whether it should be regarded as that, or as a distinct character, is another question. See http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_CPU_Model_1_Jul65.pdf On Sep 26, 2017, at 12:48 AM, Leo Broukhis via Unicode <unicode@unicode.org> wrote: Wikipedia (https://en.wikipedia.org/wiki/IBM_1620#Invalid_character) describes the "invalid character" symbol (see attachment) as a Cyrillic Ж which it obviously is not. But what is it? Does it deserve encoding, or is it a glyph variation of an existing codepoint? The question is somewhat prompted by 2BFF 1 HELLSCHREIBER PAUSE SYMBOL in the pipeline, although I learned about both earlier today within a few minutes of one another. Thanks, Leo
Aw: Re: LATIN CAPITAL LETTER SHARP S officially recognized
No, the hyphenation oddity involving the addition of letters with hyphenation (or, to be more precise, to suppress letters in unhyphenated words) never affected the letter s. It affected other letters (I know examples for f, l, m, n, p, r, and t) when followed by a vowel, like in Schiffahrt/Schiff-fahrt. It was always Sauerstoffflasche with three f's. In the old (1910) spelling of German, ss at the word boundary obligatory became ß. When the ß was replaced by ss (because of all caps or unavailability of the letter), all three s's were retained. In the current orthography, the hyphenation oddity is removed completely. --Jörg Knappen Gesendet: Montag, 03. Juli 2017 um 09:43 Uhr Von: "Alastair Houghton" <alast...@alastairs-place.net> An: "Jörg Knappen" <jknap...@web.de> Cc: a.lukya...@yspu.org, unicode@unicode.org Betreff: Re: LATIN CAPITAL LETTER SHARP S officially recognized On 2 Jul 2017, at 16:59, Jörg Knappen via Unicode <unicode@unicode.org> wrote: > > > Is it possible to design fonts that will render ẞ as SS? > > In fact, that has happened long before the capital letter sharp s was added to Unicode: The T1 encoding (aka Cork encoding) of LaTeX > does this since 1990. The reason for this was correct hyphenation for German words rendered in all caps. Wasn’t there also some oddity relating to hyphenation and “ss”/“SS” in general? I seem to recall that it used to be the case that you ended up with more “s”s than you started with when hyphenating a word containing “ss”… Kind regards, Alastair. -- http://alastairs-place.net
Aw: Re: LATIN CAPITAL LETTER SHARP S officially recognized
> Is it possible to design fonts that will render ẞ as SS? In fact, that has happened long before the capital letter sharp s was added to Unicode: The T1 encoding (aka Cork encoding) of LaTeX does this since 1990. The reason for this was correct hyphenation for German words rendered in all caps. --Jörg Knappen Gesendet: Samstag, 01. Juli 2017 um 08:51 Uhr Von: "a.lukyanov via Unicode" <unicode@unicode.org> An: unicode@unicode.org Betreff: Re: LATIN CAPITAL LETTER SHARP S officially recognized Is it possible to design fonts that will render ẞ as SS? So we could choose between ẞ and SS by just selecting the proper font, without changing the text itself. Or perhaps there will be a "font feature" to select this rendering within the same font.
Aw: Re: U+0261 LATIN SMALL LETTER SCRIPT G
This is a script capital G or, in TeX notation, {\cal G}. It reflects the use of multiple styles of the same underlying alhabet in mathematics and sciences. It is not a capital script g (note the different ordering of capital and script). --Jörg Knappen I had found in 2013 a GꞬ contrast in mathematical notations of an old (1952) physics book (see http://www.unicode.org/mail-arch/unicode-ml/y2013-m01/0092.html) Frédéric
Aw: The usage of Z WITH STROKE
Some anecdotal evidence: I was taught by my math teacher (Germany, 1970s) to stroke all z's (upper or lowercase) in order to distinguish them from the digit "2" --Jörg Knappen P.S. What pan-turkic orthography is concerned, there were also a lot of pan-turkic Latin alphabets in revolutionary Soviet Union (1920s) before Cyrillic alphabets were introduced in the Stalin era. P.P.S. You are certainly aware of this article: https://en.wikipedia.org/wiki/Z_with_stroke Gesendet: Freitag, 25. November 2016 um 15:38 Uhr Von: "Janusz S. Bień" <jsb...@mimuw.edu.pl> An: "unicode Unicode Discussion" <unicode@unicode.org> Betreff: The usage of Z WITH STROKE Hi! There are two comments to the character(s) in the U0180 chart: 1. Pan-Turkic Latin orthography 2. handwritten variant of Latin “z” Ad 1. Do I understand correctly that the Pan-Turkic Latin ortography refers to the initiative described in the post to the Linguist list: https://linguistlist.org/issues/4/4-187.html If so, where to find more information about it? I found already another post to the Linguist list https://linguistlist.org/issues/5/5-739.html but it contains only very general information. Ad 2. I'm curious how widespread, in time and space, is/was this convention. Can you suggest to me where to search for this information? Best regards Janusz -- , Prof. dr hab. Janusz S. Bien - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej) Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department) jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/
Aw: Why incomplete subscript/superscript alphabet ?
Sub- and Superscripts are considered "higher level markup" and not parts of plain text in UNicode. You can easily get at them using LaTeX notation or HTML tags for sub- or superscripts. So the question is: Why are there *some* sub- and superscript character in Unicode? And the answer is: They were found in older charactersets and Unicode provides so-called "round-trip compatibility" to those older character sets. The relevant older character sets happen not to cover a sensible full range of sub- and superscripts, therefore the gaps in Unicode. It is very probable that those gaps will not be filled at any time. --Jörg Knappen Gesendet: Freitag, 30. September 2016 um 11:57 Uhr Von: "Gael Lorieul" <glori...@coanda-deviation.info> An: "Unicode Discussion" <unicode@unicode.org> Betreff: Why incomplete subscript/superscript alphabet ? Hello all, I wonder why only a subset of the alphabet is available as subscript and/or superscript ? This is well illustrated on the table in the following Wikipedia page: https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts#Latin_and_Greek_tables Is there a reason for this ? I would love to have these characters available because I often use Unicode to write equations as comments of a source code. For instance: class Term_diff_rotDivStressTensor_splitted /** * Computes: * * μ ⎛μ⎞ ⎡1 ⎤ * —.Δω + ∇⎜—⎟×Δu + ∇×⎢—.(∇u + ∇uᵀ)·∇μ⎥ * ρ ⎝ρ⎠ ⎣ρ ⎦ */ { [...] (class definition) } or a more problematic example: /* * ⌠tᵉⁿᵈ * q(tᴺ) ← q(t⁰) +⎮ rhs(q,t) dt + (tᵉⁿᵈ - tˢᵗᵃʳᵗ) * ⌡tˢᵗᵃʳᵗ */ Here "end" and "start" would have been better as subscripts, but I could not do so because letter "d" is not available as a subscript… As you can see, having only some letters available as subscript (& superscript) is sometimes a pain… Gaël Lorieul PhD student in Computational Fluid Dynamics at Université catholique de Louvain
Aw: Re: Adding half-star to Unicode?
Talking about fancy five stars, besides the vertically split ones there is the "Anarchist star" (a symbol for anarcho-syndicalism) with a diagonal split in a upper left red half and a lower left black half. Since there are political and ideological symbols encoded in UNicode, maybe this one is worth encoding as well (probably twice, once as a black and white plain symbol and once as a colourful Emoji). See here: https://commons.wikimedia.org/wiki/Category:Anarcho-Syndicalism#/media/File:Anarchist_star.svg FIVE PIONTED STAR WITH BLACK LOWER RIGHT HALF = anarchist star ANARCHIST STAR EMOJI --Jörg Knappen Gesendet: Freitag, 24. Juni 2016 um 14:12 Uhr Von: "Frédéric Grosshans" <frederic.grossh...@gmail.com> An: unicode@unicode.org Betreff: Re: Adding half-star to Unicode? Le 24/06/2016 00:37, Leo Broukhis a écrit : > For a previous discussion on the topic, please see > the thread "Missing geometric shapes" around 11/12/12 The thread starts here : http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0008.html It contains an example of half-filled star used in RTL (Hebrew) context, in an advertisement in Haaretz here http://www.unicode.org/mail-arch/unicode-ml/y2012-m11/0024.html
Aw: Joined "ti" coded as "Ɵ" in PDF
I inspected the pdf file, and its font encoding is termed "Identity-H". I couldn't reveal much about this encoding, but it seems to be a private encoding of Adobe used especially for Asian fonts. --Jörg Knappen Gesendet: Donnerstag, 17. März 2016 um 17:43 Uhr Von: "Don Osborn" <d...@bisharat.net> An: unicode@unicode.org Betreff: Joined "ti" coded as "Ɵ" in PDF Odd result when copy/pasting text from a PDF: For some reason "ti" in the (English) text of the document at http://web.isanet.org/Web/Conferences/Atlanta%202016/Atlanta%202016%20-%20Full%20Program.pdf is coded as "Ɵ". Looking more closely at the original text, it does appear that the glyph is a "ti" ligature (which afaik is not coded as such in Unicode). Out of curiosity, did a web search on "internaƟonal" and got over 11k hits, apparently all PDFs. Anyone have any idea what's going on? Am assuming this is not a deliberate choice by diverse people creating PDFs and wanting "ti" ligatures for stylistic reasons. Note the document linked above is current, so this is not (just) an issue with older documents. Don Osborn
Aw: Re: Enclosing BANKNOTE emoji?
For the pound emoji, throw in ~90M Egyptians. --Jörg Knappen Gesendet: Dienstag, 09. Februar 2016 um 23:46 Uhr Von: "Leo Broukhis" <l...@mailcom.com> An: "Mark Davis ☕️" <m...@macchiato.com> Cc: "unicode Unicode Discussion" <unicode@unicode.org> Betreff: Re: Enclosing BANKNOTE emoji? The emojiexpress.com site is useful to check which new emoji or combinations people actually use, but the stats are likely skewed by only measuring input from one platform. Another way to look at the emojitracker.com stats: 339M people in the Eurozone : 389K uses of Euro emoji 126M people in Japan : 354K uses of Yen emoji 140M people in UK + Turkey (likely users of the Pound emoji as a stand-in for Lira) : 515K uses of pound emoji The total is 605M people : 1258K uses of non-dollar emoji Assuming the same average frequency of use, 2933K uses of the dollar emoji would be produced by 1411M people, out of which us + canada + mexico + australia (500M) + other countries using $ as (part of) the sign for their currency are way less than a half. This means that substantially more than 500M people are using the dollar emoji by default, instead of emoji of their national currencies. Assuming a lesser frequency of use will result in a greater estimate of the affected population. Leo On Tue, Feb 9, 2016 at 8:51 AM, Mark Davis ☕️ <m...@macchiato.com> wrote: Look at http://www.emojixpress.com/stats/. The stats are different, since they collect data from keyboards not twitter posts, but they have a nice button to view only the news emoji. (The numbers on the new ones will be smaller, just because it takes time for systems to support them, and people to start using them. However, they bear out my predication that the most popular would be the eyes-rolling face). Mark On Tue, Feb 9, 2016 at 5:19 PM, Leo Broukhis <l...@mailcom.com> wrote: A caveat about using emojitracker.com : it doesn't count newer emoji yet (e.g. U+1F37E bottle with popping cork is absent), thus, when they are added, their counts will be skewed. Leo On Tue, Feb 9, 2016 at 2:00 AM, Leo Broukhis <l...@mailcom.com> wrote: Thank you for the links, quite mesmerizing! On emojitracker.com (cumulative counts, but only on twitter, AFAICS), U+1F4B5 ($) had quite a respectable count of 2932622 (well above the middle of the page, around 70%ile), U+1F4B7 (pound) had 514536 (around 30%ile), and U+1F4B4 and U+1F4B6 had around 353K and 388K resp. (around 20%ile, but 10x more than the lowest counts, and about the same frequency as various individual clock faces). It is quite evident that the dollar banknote emoji serves as a stand-in for at least half a dozen of various currencies. On Mon, Feb 8, 2016 at 10:25 PM, Mark Davis ☕️ <m...@macchiato.com> wrote: I would suggest that you first gather statistics and present statistics on how often the current combinations are used compared to other emoji, eg by consulting sources such as: http://www.emojixpress.com/stats/ or http://emojitracker.com/ Mark On Mon, Feb 8, 2016 at 8:34 PM, Leo Broukhis <l...@mailcom.com> wrote: There are U+01F4B4 Banknote With Yen Sign U+01F4B5 Banknote With Dollar Sign U+01F4B6 Banknote With Euro Sign U+01F4B7 Banknote With Pound Sign This is clearly an incomplete set. It makes sense to have a generic "enclosing banknote" emoji character which, when combined with a currency sign, would produce the corresponding banknote, to forestall requests for individual emoji for banknotes with remaining currency signs. Leo
Aw: Re: Turned Capital letter L (pointing to the left, with serifs)
I have looked up some printed sources and I agree with Michael Everson and Frédéric Grosshans that the beast in question is a variant of the greek letter tau (capital or lowercase). Here are the relevant sources I consulted: Carl Faulmann: Das Buch der Schrift. Enthaltend die Schriftzeichen und Alphabete aller Zeiten und aller Völker des Erdkreises. Verlag der kaiserlich königlichen Staatsdruckerei. Wien 1878, 2. verm. und verb. Aufl. 1880 p.171 Hans Jensen: Die Schrift in Vergangenheit und Gegenwart, 3. Auflage p.459 Here is a quote from Hans Jensen: Noch in modernen Drucken finden wir die Formen ϐθϖ3ϲ7, wo andere βϑπζςτ haben. Note: i had to fake the zeta symbol with a digit 3 and the tau symbol with a digit 7 here. In German typesetting tradition the theta symbol ϑ is the preferred form, not the straight theta θ. My Opinion: The Greek Zeta Symbol and the Greek Tau Symbol are on the same footing as the "lunate sigma" alreay encoded in Unicode. They should be added in both lowercase and capital form. --Jörg Knappen Gesendet: Dienstag, 05. Januar 2016 um 06:08 Uhr Von: "Asmus Freytag (t)" <asmus-...@ix.netcom.com> An: unicode@unicode.org Betreff: Re: Turned Capital letter L (pointing to the left, with serifs) On 1/4/2016 1:33 PM, Frédéric Grosshans wrote: I looked all the pages of the 1809 edition of _Theoria motus corporum coelestium in sectionibus conicis solem ambientium_ https://archive.org/stream/bub_gb_ORUOQAAJ where Gauss used this notation in pages 80-81. Almost all notations are standard enough to be familiar to any modern (2015) mathematician or physicist, with two exceptions : this "7" symbol and ☊ U+260A ASCENDING NODE (which is still standard in astronomy). The Greek letters in particular have a pretty standard shape, and I don't see why this symbol would be the only geek letter using a fancy cursive shape. Even the Latin letters used standard shapes ( italic, roman, a few capital fraktur). That said, I did not spot a tau in the text, while most of the Greek alphabet was used. Could "7" be a standard shape for tau in 1809 Hamburg ? The problem is that he used capital Tau, which, in most fonts, looks precisely like capital Latin T. So, he used an alternate shape, the cursive one, which would have been familiar to him based on the fact that he probably studied Greek as part of his education, pretty standard subject at the time and even a hundred years later in upper level schools in Hamburg and elsewhere in Germany (and he would have seen and reproduced handwritten forms, not just printed ones). However, I still think it is a ⦢ U+29A2 TURNED ANGLE No, an angle would have two straight lines. A Greek letter has, overall, a much higher probability of being used for a variable than almost any other symbol (the one non-letter symbol (Ascending node) is one that you say is still standard in astronomy - wheras any quick search of the literature of the 19th century shows that no symbol is consistently used for the "avery daily angle". For all of these reasons, I find the suggestion of U+29A2 unconvincing. A./ Frédéric Le lun 4 janv. 2016 21:38, Raymond Mercier <raym...@almanach.co.uk> a écrit : On further reflection I can well agree that it is tau. The attached images from R. Barbour, Greek Literary Hands, show clearly (scan 3) the large upper case tau in several lines, and in scan 4 in the first and other lines a hooked version of tau. So I withdraw my suggestion of pi. Raymond From: Asmus Freytag (t) Sent: Monday, January 04, 2016 7:58 PM To: unicode@unicode.org Subject: Re: Turned Capital letter L (pointing to the left, with serifs) On 1/4/2016 10:41 AM, Michael Everson wrote: Certainly it does look more like a very common variant of “tau” than “pi” Variant of uppercase tau? A./
Aw: Re: Turned Capital letter L (pointing to the left, with serifs)
Sigh, I have to correct the attribution of the character identification, I meant Raymond Mercier and I should also mention Asmus Freytag in the place of Frédéric Grosshans. --Jörg Knappen Gesendet: Dienstag, 05. Januar 2016 um 10:10 Uhr Von: "Jörg Knappen" <jknap...@web.de> An: "Asmus Freytag (t)" <asmus-...@ix.netcom.com> Cc: unicode@unicode.org Betreff: Aw: Re: Turned Capital letter L (pointing to the left, with serifs) I have looked up some printed sources and I agree with Michael Everson and Frédéric Grosshans that the beast in question is a variant of the greek letter tau (capital or lowercase). Here are the relevant sources I consulted: Carl Faulmann: Das Buch der Schrift. Enthaltend die Schriftzeichen und Alphabete aller Zeiten und aller Völker des Erdkreises. Verlag der kaiserlich königlichen Staatsdruckerei. Wien 1878, 2. verm. und verb. Aufl. 1880 p.171 Hans Jensen: Die Schrift in Vergangenheit und Gegenwart, 3. Auflage p.459 Here is a quote from Hans Jensen: Noch in modernen Drucken finden wir die Formen ϐθϖ3ϲ7, wo andere βϑπζςτ haben. Note: i had to fake the zeta symbol with a digit 3 and the tau symbol with a digit 7 here. In German typesetting tradition the theta symbol ϑ is the preferred form, not the straight theta θ. My Opinion: The Greek Zeta Symbol and the Greek Tau Symbol are on the same footing as the "lunate sigma" alreay encoded in Unicode. They should be added in both lowercase and capital form. --Jörg Knappen
Aw: Symbol for an upside down capital L, pointing to the right?
Err... in what respect would this symbol be different from a CAPITAL GREEK LETTER GAMMA? --Jörg Knappen Gesendet: Freitag, 25. Dezember 2015 um 14:43 Uhr Von: "Costello, Roger L." <coste...@mitre.org> An: "unicode@unicode.org" <unicode@unicode.org> Betreff: Symbol for an upside down capital L, pointing to the right? Hi Folks, Here is the upside down capital L, pointing to the left: ⅂ - TURNED SANS-SERIF CAPITAL L (U+2142) Is there a symbol for an upside down capital L, pointing to the right? /Roger
Turned Capital letter L (pointing to the left, with serifs)
Here is a report of a rather strange beast occurring in historical math printing (work of C. F. Gauß) in thw 19th century: http://tex.stackexchange.com/questions/284483/how-do-i-typeset-this-symbol-possibly-astronomical images are here: http://www.archive.org/stream/abhandlungenmet00gausrich#page/n129/mode/2up http://i.stack.imgur.com/57fN3.png It looks like a big digit "7" or like a turned letter "L". In the accepted answer it was identified with the Tironian note et; an identification I'd dispute because the Tironian note Et is usually smaller in size than a capital latin letter. Anyone knows what it is? --Jörg Knappen
Aw: Re: Proposal for German capital letter "ß"
Since the captial sharp s is easily available to the public, I see it popping up everywhere in German publications, mostly in an all caps environment. I have a small collection of it (on paper). The use of the capital sharp s in German is not only a historical artefact, it is recent and modern. --Jörg Knappen Martin Dürst wrote: However, the example is also somewhat misleading. The book in the picture is clearly quite old. The Duden that was cited is new. I checked with "Der Grosse Duden" on Amazon, but all the books I found had the officially correct spelling. On the other hand, I remember that when the upper-case sharp s came up for discussion in Unicode, source material showed that it was somewhat popular quite some time ago (possibly close in age with the old Duden picture). So we would have to go back and check the book in the picture to see what it says about ß to be able to claim that Duden was (at some point in time) inconsistent with itself. Regards, Martin.
Aw: New Character Property for Prepended Concatenation Marks
I wonder how this concept relates to mathematical notation, especially the root sign. --Jörg Knappen Gesendet: Mittwoch, 25. November 2015 um 23:34 Uhr Von: announceme...@unicode.org An: announceme...@unicode.org Betreff: New Character Property for Prepended Concatenation Marks The Unicode Technical Committee is seeking feedback on a proposal to define a new character property for the class of prepended concatenation marks, also referred to as prefixed format control characters or, more generically, as subtending marks. Characters in that class include U+0600 ARABIC NUMBER SIGN and U+06DD ARABIC END OF AYAH. The new property, named Prepended_Concatenation_Mark and targeted for Unicode 9.0, would provide a mechanism to handle subtending marks collectively via properties rather than by hardcoded enumeration. A detailed description of the issue and how to provide feedback are given in Public Review Issue #310. http://blog.unicode.org/2015/11/new-character-property-for-prepended.html
Aw: Re: Square Brackets with Tick
I must admit, although I have seen really lots of mathematical notations, I have never encountered those particular brackets. I have no intuition how they should pair. --Jrg Knappen Gesendet:Samstag, 22. August 2015 um 18:35 Uhr Von:Julian Bradfield jcb+unic...@inf.ed.ac.uk An:unicode@unicode.org Betreff:Re: Square Brackets with Tick On 2015-08-22, Nigel Small ni...@nigelsmall.com wrote: 298D; 2990; o # LEFT SQUARE BRACKET WITH TICK IN TOP CORNER 298E; 298F; c # RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER 298F; 298E; o # LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER 2990; 298D; c # RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER with several code points in between. According to the code point pairs in the first and second columns of this file, these particular brackets should be paired as the *first and fourth* and the *third and second*. Intuitively however, these would actually be *first and second* and *third and fourth* if one is to expect consistency. Thats a strange intuition! Mathematical brackets are expected to pair with left-right symmetry, not rotational symmetry. As in, for example, floor and ceiling brackets. The pairing in the file is the natural one. 1. The current pairing information is correct and the sequence is irregular for some historical reason That will be the explanation. There is no inherent meaning to the order of codepoints, its just convenience. One of the experts here can probably tell us why these four brackets happen to be coded in this order. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Aw: Re: Bunny hill symbol, used in America for signaling ski pistes for novices
From the description of the symbol it looks like a geometric shape. I think it is worth to be encoded as a geometric shape (TWO BLACK DIAMONDS VERTICALLY STACKED or something like this) with a note * bunny hill. It may have (r find in future) other uses. --Jrg Knappen Gesendet:Donnerstag, 28. Mai 2015 um 23:20 Uhr Von:Shervin Afshar shervinafs...@gmail.com An:Shawn Steele shawn.ste...@microsoft.com Cc:verd...@wanadoo.fr verd...@wanadoo.fr, unicode Unicode Discussion unicode@unicode.org, Jim Melton jim.mel...@oracle.com Betreff:Re: Bunny hill symbol, used in America for signaling ski pistes for novices Since the double-diamond has map and map legend usage, it might be a good idea to have it encoded separately. I know that Im stating the obvious here, but the important point is doing the research and showing that it has widespread usage. Shervin On Thu, May 28, 2015 at 2:15 PM, Shawn Steele shawn.ste...@microsoft.com wrote: Im used to them being next to each other. So the entire discussion seems to be about how to encode a concept vs how to get the shape you want with existing code points. If you just want the perfect shape, then maybe an svg is a better choice. If were talking about describing ski-run difficulty levels in plain-text, then the hodge-podge of glyphs being offered in this thread seems kinda hacky to me. -Shawn From: ver...@gmail.com [mailto:ver...@gmail.com] On Behalf Of Philippe Verdy Sent: Thursday, May 28, 2015 2:12 PM To: Jim Melton Cc: Shawn Steele; unicode Unicode Discussion Subject: Re: Bunny hill symbol, used in America for signaling ski pistes for novices Some documentations also suggest that the two diamonds are not stacked one above the other, but horizontally. Its a good point for using only one symbol, encoding it twice in plain-text if needed. 2015-05-28 22:15 GMT+02:00 Jim Melton jim.mel...@oracle.com: I no longer ski, but I did so for many years, mostly (but not exclusively) in the western United States. I never encountered, at any USA ski hill/mountain/resort, a special symbol for bunny hills, which are typically represented by the green circle meaning beginner. Thats anecdotal evidence at best, but my observations cover numerous skiing sites. I have encountered such a symbol in Europe and in New Zealand, but not in the USA. (I have not had the pleasure of skiing in Canada and am thus unable to speak about ski areas in that country.) The double black diamond would appear to be a unique symbol worthy of encoding, simply because the only valid typographical representation (in the USA) is two single black diamonds stacked one above the other and touching at the points. Hope this helps, Jim On 5/28/2015 2:04 PM, Shawn Steele wrote: So is double black diamond a separate symbol? Or just two of the black diamond? And Blue-Black? Im drawing a blank on a specific bunny sign, in my experience those are usually just green. Arent there a lot of cartography symbols for various systems that arent present in Unicode? From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Philippe Verdy Sent: Thursday, May 28, 2015 12:47 PM To: unicode Unicode Discussion Subject: Bunny hill symbol, used in America for signaling ski pistes for novices Is there a symbol that can represent the Bunny hill symbol used in North America and some other American territories with mountains, to designate the ski pistes open to novice skiers (those pistes are signaled with green signs in Europe). Im looking for the symbol itself, not the color, or the form of the sign. For example blue pistes in Europe are designed with a green circle in America, but we have a symbol for the circle; red pistes in Europe are signaled by a blue square in America, but we have a symbol for the square; black pistes in Europe are signaled by a black diamond in America, but we also have such black diamond in Unicode. But I cant find an equivalent to the American Bunny hill signal, equivalent to green pistes in Europe (this is a problem for webpages related to skiing: do we have to embed an image ?). -- Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144 Chair, ISO/IEC JTC1/SC32 and W3C XML Query WG Fax : +1.801.942.3345 Oracle Corporation Oracle Email: jim dot melton at oracle dot com 1930 Viscounti Drive Alternate email: jim dot melton at acm dot org Sandy, UT 84093-1063 USA Personal email: SheltieJim at xmission dot com = Facts are facts. But any opinions expressed are the opinions = = only of myself and may or may not reflect the opinions of anybody = = else with whom I may or may not have discussed the issues at hand. =
Aw: Combining character example
Hi Mark, the use of DOT BELOW and LINE BELOW is in fact consistent in German Duden. The difference in the diacritics is used to denote length of the stressed vowel, DOT BELOW denotes a short vowel and LINE BELOW denotes a long vowel. Diphthongs are always long and there is a single line under the whole Diphthong. Digraphs (e.g. the ou in words borrowed from French) also have either a single line under the whole digraph or (this happens rarely) a single dot in the middle of the digraph. --Jrg Knappen Gesendet:Donnerstag, 16. April 2015 um 10:01 Uhr Von:Mark Davis m...@macchiato.com An:Unicode Public unicode@unicode.org, Unicode Book b...@unicode.org Betreff:Combining character example I happened to run across a good example of productive use of combining marks, the Duden site (a great online dictionary for German).They use U+0323 ( ) COMBINING DOT BELOW to indicate the stress.Here is an example: unterbuttern http://www.duden.de/rechtschreibung/unterbuttern They arent, however, consistent; you also see underlining for stress. einschrnken But not, interestingly, with the HTML underline, but withU+0332 ( ) COMBINING LOW LINE. Mark
Looking for a standard on historical countries
Sorry for this off-topic question: Does someone here is aware of a standard or a de facto standard for names or codes of historical countries? For the requirement I have in mind, all countries where there was a printing press would be optimal coverage, anything going beyond 1974 (ISO 3166-3) will be better than nothing. The Getty Thesaurus of Geographical Names (TGN;http://www.getty.edu/research/tools/vocabularies/tgn/ ) covers some historical countries (e.g., Preussen), but is far from being complete (missing, e.g., Schaumburg-Lippe). The same holds for the MARC Code List for Countries ( http://www.loc.gov/marc/countries/countries_name.html ). Thanks, Jrg Knappen ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Aw: Ambiguous hyphenation cases with
With TeX and LaTeX there is an elegant solution. TeX has the primitive discretionary{prebreak}{postbreak}{nobreak}, which spells out like discretionary{t-}{}{} for the insertion of an additional t at hyphenation. It also handles cases like the traditional german hyphenation of ck as k-k with dicscretionary{c-}{}{k} The Babel system (inspired by german.sty) includes nifty shorthands like t and c for this cases. The semantics of U+00AD (SOFT HYPHEN) is too primitive to implement this kind of behaviour, the same is true for shy; in HTML. --Jrg Knappen Gesendet:Dienstag, 22. Juli 2014 um 16:03 Uhr Von:fantasai fantasai.li...@inkedblade.net An:Hkan Save Hansson hakan.hans...@edison.se, www-st...@w3.org www-st...@w3.org, Unicode unicode@unicode.org Betreff:Ambiguous hyphenation cases with On 05/12/2014 12:43 AM, Hkan Save Hansson wrote: Hi fantasai, Regarding your answer to my second suggestion (if you are referring to James Clarks first answer): The problem is that the hyphenation system in itself cant decide how to change the spelling, without any dictionary functionality. It cant know if I meant mat-tjuv (food thief in Swedish) or matt-tjuv (carpet thief) when I wrote matshy;tjuv. So there has to be a way to tell the hyphenation system that. Hm. I dont think I have a solution for that problem. :/ Currently youd just have to not hyphenate that word. CCing Unicode, in case anyone there has a solution Up-reference: http://lists.w3.org/Archives/Public/www-style/2014Feb/0739.html ~fantasai ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Aw: New symbol to denote true open access (e.g. to scholarly literature), analogous to the copyright symbol
Even when this symbol really catches on (what I doubt because it is too close to the @ sign in the first place) chance are low that it will be encoded in UNicode. Precedents like the Creative Commons sign or the Copyleft sign have been discussed on this mailing list (search the archives for the relevant threads) but were never encoded in UNicode. When the symbol does not catch on, why should it be encoded in UNicode? --Jrg Knappen Gesendet:Freitag, 21. Mrz 2014 um 12:14 Uhr Von:Jan Velterop velte...@gmail.com An:unicode@unicode.org Betreff:New symbol to denote true open access (e.g. to scholarly literature), analogous to the copyright symbol May I propose a new Unicode symbol to denote true open access, for instance applied to scholarly literature, in a similar way that and denote copyright and registered trademarks respectively? The proposed symbol is an encircled lower case letter a, in particular in a font where the a has a tail, as in a font like Arial, for instance, and not as in a font like Century Gothic. A sketch of what I have in mind is here: http://theparachute.blogspot.co.uk/2014/03/proposed-open-access-symbol.html The intended use would be for documents and images that have been published with so-called BOAI-compliant open access (http://www.budapestopenaccessinitiative.org/read), meaning that all reuse is permitted, with the only permissible condition that the author(s) should be acknowledged (CC_BY licence: http://creativecommons.org/licenses/by/4.0/). This condition would not be mandatory, and also public domain, CC-0 licences would be denoted by the proposed symbol (http://creativecommons.org/publicdomain/zero/1.0/) I am seeking comments and support for this proposal. Jan Velterop ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Aw: Re: [Unicode] two Hanzi
Just to keep the chemistry straight: Nebulium and Coronium were conjectured elements at the end of the 19th century and beginning of the 20th century when the atomic number that classifies the elements was not yet known. They have their place in spectroscopic literature and astrophysics; but the spectral lines associated with them are now identified as so called forbidden lines of well known elements. Nevertheless, the characters for them are certainly used (as the english names; Nebulium even has a Wikipedia and a Britannica entry) and a legitimate addon to UNicode. Who writes a proposal? --Jrg Knappen Gesendet:Donnerstag, 20. Mrz 2014 um 14:50 Uhr Von:suzuki toshiya mpsuz...@hiroshima-u.ac.jp An:shi zhao shiz...@gmail.com Cc:unicode@unicode.org Betreff:Re: [Unicode] two Hanzi If they are officially standardized characters for the elements by PRC government, China NB will submit them to ISO/IEC 10646 via Urgently Needed Characters process. They are official? Regards, mpsuzuki On 03/20/2014 10:36 PM, shi zhao wrote: plese add two Hanzi (up + down ) and (up + down ) see http://www.term.org.cn/CN/abstract/abstract9314.shtml# include in : * Zhonghua Zihai, 1994: 1770. * Lu gusun, The English-Chinese Dictionary (), 1991: 701,2219. (up + down ) = nebulium (see http://yedict.com/zslistbs.asp?word=%C6%F84 ) (up + down ) = coronium = newtonium (see http://yedict.com/zslistbs.asp?word=%C6%F87 ) My blog: http://shizhao.org twitter: https://twitter.com/shizhao [[zh:User:Shizhao]] ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Aw: Astrological symbol for Pluto?
Unfortunately, this astrological symbol is given in the Wikipedia article, but not sourced. So I think, further evidence for its usage is needed. --Jrg Knappen Gesendet:Sonntag, 02. Februar 2014 um 05:20 Uhr Von:Shriramana Sharma samj...@gmail.com An:UnicoDe List unicode@unicode.org Betreff:Astrological symbol for Pluto? Currently Unicode encodes a distinct astrological symbol for Uranus 2645 vs an astronomical symbol 26E2 . However the only symbol encoded for Pluto is the astronomical one: 2647 . Just now I learnt from https://en.wikipedia.org/wiki/Pluto#Name that there is a distinct astrological symbol: Has there been any proposal to encode this? (Im guessing Michael might be interested...) -- Shriramana Sharma ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Aw: Re: Astrological symbol for Pluto?
In fact, several dwarf planets have _astronomical_ symbols that were published together with their official names in the relevant astronomical journals. When it became clear that there are too many minor planets around, the assignment of symbols was halted. (37) Fides was the last minor planet to receive a symbol. Most of the symbols are already available in UNicode. For a quick reference, see http://en.wikipedia.org/wiki/Astronomical_symbols --Jrg Knappen P.S. I have also seen astro_l_ogical symbols for some of the Kuyper belt objects, but there seems to be no agreement between different authors. Gesendet:Montag, 03. Februar 2014 um 14:14 Uhr Von:Shriramana Sharma samj...@gmail.com An:Frdric Grosshans frederic.grossh...@gmail.com Cc:UnicoDe List unicode@unicode.org Betreff:Re: Aw: Astrological symbol for Pluto? On Mon, Feb 3, 2014 at 4:15 PM, Frdric Grosshans frederic.grossh...@gmail.com wrote: Actually, it is sourced (with the other symbils) to http://www.uranian-institute.org/bfglyphs.htm , which lists no less than 4 symbols for Pluto... In any case, it seems its astronomical symbol was encoded quite early (DerivedAge = 1.1) which was before the 2006 IAU decision to demote it to dwarf planet status. Of course, even if it were encoded today Im sure it would be the only dwarf planet to have a symbol encoded since no other dwarf planet has captured the common mans imagination (and basic knowledge) like Pluto, and I have not heard any of the other dwarf planets (Ceres, Haumea, Makemake and Eris) having any symbols... -- Shriramana Sharma ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Aw: Re: Re: Re: Re: Re: Re: Do you know a tool to decode UTF-8 twice
When you are looking for a *new* name for that encoding, why dont you just adopt the pythonese precedent mysql-latin1 ? It is as good or as bad as any other name, but has some footing just now. --Jrg Knappen Gesendet:Mittwoch, 29. Januar 2014 um 21:12 Uhr Von:Anne van Kesteren ann...@annevk.nl An:Buck Golemon b...@yelp.com Cc:Markus Scherer markus@gmail.com, Jrg Knappen jknap...@web.de, Frdric Grosshans frederic.grossh...@gmail.com, unicode unicode@unicode.org, unic...@norbertlindenberg.com Betreff:Re: Re: Re: Re: Re: Re: Do you know a tool to decode UTF-8 twice On Wed, Jan 29, 2014 at 11:57 AM, Buck Golemon b...@yelp.com wrote: Anne: Given that the intent is to implement exactly the whatwg spec, and the group is currently called whatwg (even though it may eventually become a historical artifact), is whatwg-1252 most appropriate? Its up to you I suppose, but whatwg-1252 just seems like long term it will lose its meaning. For the web windows-1252 will always have this meaning due to deployed content, so web-windows-1252 if you need to disambiguate from a different implementation of windows-1252 makes sense to me. -- http://annevankesteren.nl/ ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Aw: Re: Re: Re: Re: Re: Do you know a tool to decode UTF-8 twice
A little postscrptum to this old thread: On pyPi, there is now a codec available that handles the peculiar definition of latin1 inside mysql. The package is called mysql-latin1-codec and features an encoding consisting of cp1252 plus 0x81, 0x8D, 0x8F, 0x90, 0x9D (the latter five characters are undefined in the python codec for cp1252). https://pypi.python.org/pypi/mysql-latin1-codec/1.0 --Jrg Knappen Gesendet:Mittwoch, 30. Oktober 2013 um 19:14 Uhr Von:Buck Golemon b...@yelp.com An:Frdric Grosshans frederic.grossh...@gmail.com Cc:Jrg Knappen jknap...@web.de, unicode unicode@unicode.org Betreff:Re: Aw: Re: Re: Re: Re: Do you know a tool to decode UTF-8 twice On Wed, Oct 30, 2013 at 9:56 AM, Frdric Grosshans frederic.grossh...@gmail.com wrote: Le 30/10/2013 17:32, Jrg Knappen a crit : The data did not only contain latin-1 type mangling for the non-existent Windows characters, but also sequences with the raw C1 control characters for all of latin-1. So I had to do them, too. The data werent consistent at all, not even in their errors. --Jrg Knappen Your question helped me dust off and repair a non working python snippet I wrote for a similar problem. I was stuck with the mixing of windows-1252 and latin1 controls (linked with a chinese characters). I write it below for reference. The python snippet below does not need sed, defines a function (unscramble(S)) which works on strings. The extension to files should be easy. Frdric Grosshans def Step1Filter(S): for c in S : #works character/character because of the cp1252/latin1 ambiguity try : yield c.encode(cp1252) except UnicodeEncodeError : yield c.encode(latin1) #Useful where cp1252 is undefined (81, 8D, 8F, 90, 9D) def unscramble(S): return b.join(c for c in Step1Filter(S)).decode(utf8) PS: If anyone is interested in a licence, I consider this simple enough to be in the public domain an uncopyrightable. This encoding youve implemented above is known as windows-1252 by the whatwg and all browsers [1][2]. The implementation of cp1252 in python is instead a direct consequence of the unicode.org definition [3]. [1]http://encoding.spec.whatwg.org/index-windows-1252.txt [2]http://bukzor.github.io/encodings/cp1252.html [3]http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Aw: Commercial minus as italic variant of division sign in German and Scandinavian context
The most important word in the comment on 00F7 DIVISION SIGN is occasionally. In fact, the occasions are such rare that you can live a whole life in germany without encountering one of them. On the other hand, 00F7 DIVISION SIGN is used _frequently_ in german schoolbooks to denote ... division (books aimed at professionals doing math prefer : (COLON) or / (SLASH) for this purpose, but schoolbooks dont). 2052 commercial minus sign _always_ means subtraction and it has this shape (or the alternate shape ./.) in all contexts, roman or italic. It is not the italic version of some other symbol. Hope this helps, Jrg Knappen Gesendet:Donnerstag, 16. Januar 2014 um 04:43 Uhr Von:Leif Halvard Silli xn--mlform-iua@mlform.no An:unicode@unicode.org Betreff:Commercial minus as italic variant of division sign in German and Scandinavian context Thanks to our discussion in July 2012,[1] the Unicode code charts now says, about 00F7 DIVISION SIGN, this: occasionally used as an alternate, more visually distinct version of 2212 {MINUS SIGN} or 2011 {NON-BREAKING HYPHEN} in some contexts [ snip ] 2052 commercial minus sign However, I think it can also be added somewhere that commercial minus is just the italic variant of division minus. Ill hereby argue for this based on an old German book on commercial arithmetics I have come accross, plus what the the July 2012 discussion and what Unicode already says about the commercial sign: FIRST: IDENTICAL CONTEXTS. German language is an important locale for the Commercial Minus. In German, the Commercial minus is both referred to as kaufmnnische Minus(zeichen) and as buchhalterische Minus (Commercial Minus Character and Bookkeeper Minus). And, speaking of division minus in the context I know best, Norway, we find it in advertising (commercial context) and in book keeping documentation and taxation forms. Simply put, what the Unicode 6.2 General Punctuation section says about Commercial Minus, can also be said about DIVISION SIGN used as minus: U+2052 % commercial minus sign is used in commercial or tax related forms or publications in several European countries, including Germany and Scandinavia. So, basically and for the most part, the commercial minus and the division sign minus occur in the very same contexts, with very much the same meaning. This is a strong hint that they are the same character. SECOND: GERMAN USE OF DIVISION SIGN FOR MINUS IN COMMERCIAL CONTEXT. Is there any proof that German used both an italics variant and a non-italics variant of the division minus? Seemingly yes. The book Kaufmnnische Arithmetik (Commercial arithmetics) from 1825 by Johann Philipp Schellenberg. By reading section 118 Anhang zur Addition und Subtraction der Brche [Appendix about the addition and subtraction of fractions]) at page 213 and onwards,[2] we can conclude that he describes as commercial use of the division minus, where the signifies a _negative remainder_ of a division (while the plus sign is used to signify a positive remainder). Or to quote, from page 214: so wird das Fehlende durch das [Zei]chen (minus) bemerkt, und bei Berechn[nung der Preis der Waare abgezogen [then the lacking remainder is marked with the (minus) and withdrawn when the price of the commodity is calculated]. {Note that some bits of the text are lacking, I marked my guessed in square brackets.} I did not find (yet) that he used the italic commercial minus, however, the context is correct. (My guess is that the italics variant has been put to more use, in the computer age, partly to separate it from the DIVISION SIGN or may be simply because people started to see it often in handwriting but seldom in print. And so would not have recognized it in the form of the non-italic division sign.) THIRD: IDENTICAL INTERPRETATION The word abgezogen in the above quote is interesting since the Code Charts for 2052 COMMERCIAL MINUS cites the related German word abzglich. And from the Swedish context, the charts quotes the _expression_ med avdrag. English translation might be to be withdrawn or with subtraction/rebate [for]. Simply put, we here see the commercial meaning. WHAT ABOUT COMMERCIAL MINUS AS CORRECT SIGN IN SCANDINAVIAN SCHOOLS? UNICODE 6.3 notes that in some European (e.g. Finnish, Swedish and perhaps Norwegian) traditions, teachers use the Commercial Minus Sign to signify that something is correct (whereas a red check mark is used to signify error). If my theory is right, that commercial minus and division sign minus are the same signs, how on earth is that possible? How can a minus sign count as positive for the student? The answer is, I think, to be found in the Code Charts Swedish description (med avdrag/with subtraction/rebate). Because, I think that the correct understanding is not that it means correct or OK. Rather, it denotes something that is counted in the customer/students favor. So, you could say it it really means slack, or rebate. So it really mans
Aw: Re: Re: Re: Do you know a tool to decode UTF-8 twice
Thanks again! My updated sed pattern generator now looks like: r = range(0xa0, 0x170) file = open(fixu8.sed, w) for i in r: pat1 = s/+unichr(i).encode(utf-8).decode(latin-1).encode(utf-8) + / + unichr(i).encode(utf-8) +/g print file, pat1 try: pat2 = s/+unichr(i).encode(utf-8).decode(windows-1252).encode(utf-8) + / + unichr(i).encode(utf-8) +/g except: pat2 = pat1 if (pat1 != pat2): print file, pat2 doing both latin-1 and windows-1252 mangled double utf-8. This is probably enough for now, the rate of errors is low enough for practical purposes (i.e., lower than the natural error rate introduced by typing errors) --Jrg Knappen Gesendet:Mittwoch, 30. Oktober 2013 um 15:34 Uhr Von:Frdric Grosshans frederic.grossh...@gmail.com An:unicode@unicode.org Betreff:Re: Aw: Re: Re: Do you know a tool to decode UTF-8 twice Le 29/10/2013 17:15, Jrg Knappen a crit : After running this script, a few more things were there: Non-normalised accents and some really strange encodings I could not really explain but rather guess their meanings, like s///g s///g s/A//g s/a//g s/E//g s/e//g s///g s///g s///g s///g s///g It was probably not utf8 read as latin 1 and reencoded in utf8, but utf_8 encoding read as Windows 1252 ( http://en.wikipedia.org/wiki/Windows-1252 ) and reencoded as utf-8. Each of the combination above contains a character absent in latin-1 (), and some of them are only present in Windows-1252 () and not in Latin-15, the other possible mistake. Iv e check that this is consistent with and but not with your . This double encoding would give : =Win1252(C3 84)=110.00011 10.000100 = UTF8(00011 000100)=unicode 00C4 = (and not ) Frdric
Aw: Re: Re: Re: Re: Do you know a tool to decode UTF-8 twice
The data did not only contain latin-1 type mangling for the non-existent Windows characters, but also sequences with the raw C1 control characters for all of latin-1. So I had to do them, too. The data werent consistent at all, not even in their errors. --Jrg Knappen Gesendet:Mittwoch, 30. Oktober 2013 um 16:58 Uhr Von:Frdric Grosshans frederic.grossh...@gmail.com An:Jrg Knappen jknap...@web.de Cc:unicode@unicode.org Betreff:Re: Aw: Re: Re: Re: Do you know a tool to decode UTF-8 twice Le 30/10/2013 16:13, Jrg Knappen a crit : Thanks again! My updated sed pattern generator now looks like: r = range(0xa0, 0x170) file = open(fixu8.sed, w) for i in r: pat1 = s/+unichr(i).encode(utf-8).decode(latin-1).encode(utf-8) + / + unichr(i).encode(utf-8) +/g print file, pat1 try: pat2 = s/+unichr(i).encode(utf-8).decode(windows-1252).encode(utf-8) + / + unichr(i).encode(utf-8) +/g except: pat2 = pat1 if (pat1 != pat2): print file, pat2 doing both latin-1 and windows-1252 mangled double utf-8. This is probably enough for now, the rate of errors is low enough for practical purposes (i.e., lower than the natural error rate introduced by typing errors) Why to you do both latin1 and windows-1252 ? Windows-1252 is supposed to be a superset of latin1, so it should be enough. Or is there a problem with the few undefined bytes of windows-1252 (81, 8D, 8F, 90, 9D) ? Frdric
Do you know a tool to decode UTF-8 twice
I have a database with broken encoding, containing a lot of UTF-8 twice (that infamous encoding that arises when UTF-8 is interpreted as latin-1 and converted to UTF-8 again) encoding besides ASCII and UTF-8 proper. Is there a ready made tool that decodes UTF-8 twice while keeping UTF-8 proper in place? --Jrg Knappen
Aw: Re: Do you know a tool to decode UTF-8 twice
Hi Steffen, data arent that easy. There are non-latin1-characters encoded in the UTF8 part. I expect among others typographic apostrophes, polish characters, some mediaevalist characters like (u with tilde). Maybe, there is also some greek inside, but I am not sure about that. --Jrg Knappen Gesendet:Montag, 28. Oktober 2013 um 12:34 Uhr Von:Steffen Daode Nurpmeso sdao...@gmail.com An:Jrg Knappen jknap...@web.de Cc:unicode@unicode.org Betreff:Re: Do you know a tool to decode UTF-8 twice Jrg Knappen jknap...@web.de wrote: Is there a ready made tool that decodes UTF-8 twice while keeping UTF-8 proper in place? Isnt a shell script with a truly validating iconv(1) enough? This works for me if in utf8.1 there is EI in UTF-8 and i run ?0[steffen@sherwood tmp] iconv -f latin1 -t utf8 utf8.1 utf8.2 As in for i in utf8.1 utf8.2; do if iconv -f utf8 -t latin1 {i} iconv -f utf8 -t utf8 /dev/null 21; then echo {i}: bummer, going home by one iconv -f utf8 -t latin1 {i} {i}.new 21 else echo {i}: valid UTF-8 fi done ill end up as ?0[steffen@sherwood tmp] sh utf8dec.sh utf8.1: valid UTF-8 utf8.2: bummer, going home by one ?0[steffen@sherwood tmp] Ciao, --Jrg Knappen --steffen
Aw: Re: symbols/codepoints for necessity and possibility in modal logic
I think, U+25C7 WHITE DIAMOND is the best choice, followed by U+27E1 WHITE CONCAVE-SIDED DIAMOND never (modal operator) The latter has a more fancy shape and might not be the one the reader expects. As a plus, it comes also with versions having right and left ticks, needed in some extensions of modal logic. I couldnt locate WHITE DIAMOND WITH LEFTWARDS TICK in UNicode. (U+2662 WHITE DIAMOND SUIT would also look OK, but I think this is symbol abuse. Can be used as a fallback when the font of choice has this one, but none of the two above.) For the properties of mathematical symbols, see also http://www.unicode.org/reports/tr25/ ---but I have to admit that the report does not answer the specific question posed here. Maybe this mapping table is more useful (but harder to read): http://www.w3.org/Math/characters/unicode.xml --Jrg Knappen P.S. Id consider U+22C4 DIAMOND OPERATOR as wrong because it is used as a binary operator which has a very different spacing than the unary modal operator needed here. Gesendet:Freitag, 19. Juli 2013 um 09:43 Uhr Von:Stephan Stiller stephan.stil...@gmail.com An:Unicode Public unicode@unicode.org Betreff:Re: symbols/codepoints for necessity and possibility in modal logic What is wrong with using DIAMOND OPERATOR? wrong is strong wording and goes beyond what I suggested or implied, but its not clear to a user of Unicode that its the right fit either. There are a couple of indicators factoring in: The charts mention modal logic in conjunction with (U+25FB) and (U+27E0) but not with (U+22C4). The glyph in the code charts is tiny (and that of Cambria Math is tiny as well). Typographically you see various things (a lozenge, fallback to letter-M) in esp older books, but it feels like its meant to be an orthogonal diamond of perhaps slightly less area than the box but descending a little above and below the box, which is somewhat taller than x-height. The book by {Blackburn, de Rijke, Venema} has glyphs that look right. This is more than a guess: it makes sense if they have similar visual weight, as they are literally defined to be duals of one another; but whether you can make them geometrically congruent symbols of equal area I havent tested (this might have the diamond ascend too far). The vague notion of operator (a word with different meanings in math, from logical relation to [non-logical/non-relational] mapping of type AAA or perhaps AAB to (linear) map (between say vector spaces) in linear algebra) in this context (in the code charts) seems to refer to something like my middle meaning, which is likely to use a smaller symbol around x-height in placement and dimensions. The glyph of (U+2B26) seems to have a more appropriate name, but in the charts I like U+25C7. The differently sized square-like symbols are hard to semantically tell apart in/from the charts anyway. These symbols are the first two visually distinct ones you define in modal logic, so theyre well-known and standardized in meaning for anyone who had had contact with the field. Its surprising theyre not explicitly named in the charts. (Theres stuff like the outdated horseshoe for logical implication popping up in the relevant books, but that is a leftover or outdated logic notation in general.) So for box and diamond its quite reasonable to be expecting a standard math font to provide them just right out of the box; for whatever commonly used box-like symbols in math there are, one would assume that there are corresponding codepoints; otherwise youd have to choose a different font. Stephan
Aw: Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)
My opinion on the cedilla mess is the following: * Add preemptively LATIN [CAPITALLOWERCASE] LETTER * WITH CEDILLA ATTACHED for every Latvian/Livonian character currently in UNicode. (Dont use terms like MARSHALLESE [CAPITALLOWERCASE] LETTER [MN] -- such entities dont exist from a character encoding point of view.) * Declare the list of exceptions to Cedilla rendering officially closed. Whenever another such thing (say, LATIN CAPITAL LETTER P WITH COMMA BELOW / LATIN LOWERCASE LETTER P WITH TURNED COMMA ABOVE) occurs in real life, it will be encoded ... WITH COMMA BELOW. * Font design is an entirely different field. Original german font designs differ from french or anglo-american ones in several aspects, and original marshallese font desgns will be different, too. I see no problem here. I doubt that one size fits it all is the right way to tackle font design. --Jrg Knappen
Aw: Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)
Micheal Everson schrieb: My opinion on the cedilla mess is the following: * Add preemptively LATIN [CAPITAL|LOWERCASE] LETTER * WITH CEDILLA ATTACHED for every Latvian/Livonian character currently in UNicode. Why? Latvian and Livonian don't use letters with proper cedilla attached. Maybe my english wasn't perfect here; of course I think that for writing Latvian the existing characters shall be used. I meant for in the sense of foreach or for loop in programming languages. And yes, I think not only the four character required for marshallese, but also the other ones (g, k, and r). (Don't use terms like MARSHALLESE [CAPITAL|LOWERCASE] LETTER [M|N] -- such entities don't exist from a character encoding point of view.) Yes they do. Cf. U+0406 CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I. The character name exists to distinguish it from other characters and to guide the user in the character's use. But that character exists as a base letter with a distinct shape. There is no distinct base letter marshallese m or n. * Declare the list of exceptions to Cedilla rendering officially closed. Whenever another such thing (say, LATIN CAPITAL LETTER P WITH COMMA BELOW / LATIN LOWERCASE LETTER P WITH TURNED COMMA ABOVE) occurs in real life, it will be encoded ... WITH COMMA BELOW. I think that is understood, but where would you declare this? In the explanatory notes in the introduction to the standard. I don't have the book here to suggest a more exact location in the moment. --Jörg Knappen Michael Everson * http://www.evertype.com/
Aw: Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)
Micheal Everson schrieb: * Add preemptively LATIN [CAPITAL|LOWERCASE] LETTER * WITH CEDILLA ATTACHED for every Latvian/Livonian character currently in UNicode. Why? Latvian and Livonian don't use letters with proper cedilla attached. Maybe my english wasn't perfect here; of course I think that for writing Latvian the existing characters shall be used. I meant for in the sense of foreach or for loop in programming languages. I have no idea what that means. You want to add a bunch of new non-decomposed characters with a proper cedilla… why? And yes, I think not only the four character required for marshallese, but also the other ones (g, k, and r). Why? The first reason is to solve this problem completely and not only to resolve a Latvian-Marshallese conflict and leave some other exceptions for later. The second reason is that the letter g, k, l, m, r with proper cedillas are currently not encodable using UNicode (because of the latvian exceptions and canonical composition/decomposition), but they should *obviously* be encodable. (Don't use terms like MARSHALLESE [CAPITAL|LOWERCASE] LETTER [M|N] -- such entities don't exist from a character encoding point of view.) Yes they do. Cf. U+0406 CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I. The character name exists to distinguish it from other characters and to guide the user in the character's use. But that character exists as a base letter with a distinct shape. There is no distinct base letter marshallese m or n. There is no decomposition. There is no base character + diacritic. The whole thing is a letter used in Marshallese. (It's just a name.) Allthough there is the famous Goethe quote Namen sind Schall und Rauch I think good naming style matters, and I prefer the descripte style LATIN CAPITAL LETTER L WITH PROPER CEDILLA (marshallese) to the ad-hoc style LATIN CAPITAL LETTER MARSHALLESE LETTER L WITH CEDILLA. But this is a question of style and can be debatted endlessly without consensus. --Jörg Knappen Michael Everson * http://www.evertype.com/
Aw: Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)
Dominikus Dittes Scherkl schrieb: Why not instead encoding a new combining MARSHALLESE CEDILLA that ought to be used with g, k, l, m, r and their uppercase counterparts? This is not a good idea, because the combining MARSHALLESE CEDILLA can be combined with the letter C, too. This creates all kind of havoc with the Ç (including fake internationalised domain names). The remaining letters with cedilla need to be precomposed and non-decomposable. --Jörg Knappen
Aw: Re: Missing geometric shapes
Also the asymmetric geometric shapes dont have the mirror-property (it is restricted to parentheses and mathematical operators). Thats the reason why I have proposed two characters instead of only one. Adding the mirror property to the bicolor staronly would violate the minimum surprise principle.--Jrg Knappen Gesendet:Donnerstag, 08. November 2012 um 11:15 Uhr Von:Michael Everson ever...@evertype.com An:Unicode Discussion unicode@unicode.org Betreff:Re: Missing geometric shapes On 8 Nov 2012, at 09:59, Simon Montagu smont...@smontagu.org wrote: Please take into account that the half-stars should be symmetric-swapped in RTL text. I attach an example from an advertisment for a movie published in Haaretz 2 November 2012 I dont think Geometric Shapes have the mirror property. 2605;BLACK STAR;So;0;ON;N; 2606;WHITE STAR;So;0;ON;N; In a Hebrew context youd just choose the star you wanted (black-white vs white-black) and use it. Michael Everson * http://www.evertype.com/
Missing geometric shapes
Hi,after a long time of absence I drop in again.The reason is that I just was trying to show the rating on a webpage using the popular of 1 to 5 starts including half-coloured starts just usingUNicode characters.But: There is no character BLACK AND WHITE STAR in UNicode yet.So should the following two characters be added to the Geometric shapes block:BLACK AND WHITE STARWHITE AND BLACK STAR?For the purpose I have in mind, it is not really crucial whether the stars(five pointed, of course) are divided vertically or diagonally, I suggest verticaldivision as the standard representation.For an example of use look here: http://xkcd.com/1098/-- Jrg Knappen
Re: Sample of german -burg abbreviature
Michael Everson schrieb: I assumed that the curly thing used over the letter u in German handwriting was a breve (not a combining u superimposed over a u), and so in these examples though the u is deleted, its breve is not. I agree with Michael, that the thing is a breve -- however with an unusual plaecement. To me, there are three resolutions two the burg-abbreviature problem: 1) Add one new character, ZERO WIDTH INVISIBLE LETTER, to the UCS. Encode the burg-abbreviature as bzwilcomb. breve aboveg 2) Add one new character, COMBINING RIGHT SHIFTED BREVE ABOVE, to the UCS. Encode the burg-abbreviature as bcomb. right shifted breve aboveg 3) Add two new characters, LATIN SMALL ABBREVIATURE BURG, and LATIN CAPTITAL ABBREVIATUR BURG, to the UCS. Then, the burg-abbreviature is one UNicode character. [Note: The burg-abbreviature can occur in an all-caps context with the breve placed in the middle between capital B and G.] I strongly prefer solution 1 because it is fully general with a minimum of effort added. It can also handle TeX's tie accent. TeX's tie accent is an inverted right shifted breve above -- that's how it is implemented in TeX and METAFONT by Donald Knuth. It has the width of a normal accent, but the glyph hangs out of its bounding box such that it is placed between two letters. The thing is used in some transliteration of russian, where the letter ya is transcribed as \t{\ia}, i. e. an inverted breve placed between a dotless i (\i) and a. A sample can be found in Donald E. Knuth, the TeXbook. Solution 2) is also a good one and it can be extended easily to the case of TeX's tie accent by adding a second character, COMBINING RIGHT SHIFTED INVERTED BREVE ABOVE, to the UCS. Solution 3) is ad hoc and will probably open the door for dozens of other candiates (like the tied ia). --Jorg Knappen P.S. The thing in the burg-abbreviature is clearly *not* a raised u: a raised small u has a right stem which I have never seen in the burg abbreviature. The breve is a mnemonic hint to the u, since it was once obligatory to mark all u's with a breve in german handwriting (Suetterlin) -- and it is still wide spread practice.
Sample of german -burg abbreviature
I have scanned a sample of the german -burg abbreviature. It is from Diercke Weltatlas, 165. Auflage, Georg Westermann Verlag, Braunschweig 1972, card page 14. In the north you can find two times the -berg abbreviature in Herrenbg. [Herrenberg] and Brombg. [Bromberg]. SW from Tuebingen you find Rottenb[U]g. [Rottenburg] and south of it there's Weilerb[U]g. [Weilerburg]. Note the fancy semi-cyrillic shape of the breve between the letters b and g -- it is quite typical for this cartographic font. I don't know what they do with a true breve (like in Romanian) since this atlas transkribes all names into german. The symbol fans may also note the circle with upright flag besides Hohenentringen and Roseck (denoting a castle) and the circle with slanted flag (denoting the ruins of a castle) besides Weilerburg. IMHO, the set of cartographic symbols is another one to be checked against UNicode. --Jorg Knappen
Re: Sample of german -burg abbreviature
On Sun, 26 Sep 2004, Adam Twardoch wrote: From: Jörg Knappen [EMAIL PROTECTED] I have scanned a sample of the german -burg abbreviature. It is from Diercke Weltatlas, 165. Auflage, Georg Westermann Verlag, Braunschweig 1972, card page 14. Very interesting! It would be even more interesting if you told us the URL so we can actually look at it! :) Oh ... the locator is http://www.uni-mainz.de/~knappen/diercke.jpg --Jorg Knappen
RE: Saudi-Arabian Copyright sign
Michael Everson schrieb: At 13:07 -0700 2004-09-20, Kenneth Whistler wrote: ARABIC HAH COPYRIGHT SIGN * used in Saudi Arabia or even: CIRCLED ARABIC LETTER HAH * a copyright sign used in Saudi Arabia Both naming suggestions are fine with me. An aside: The arabic word for right is haqq --starting with the letter in the circle. The second would be better. And is the circled C used in Saudi Arabia for the copyright used as well? I will try and gather more information at the Frankfurt book fair (beginning of October) where the arabic world is guest of honour. --Jorg Knappen
Re: Saudi-Arabian Copyright sign
Doug Ewell schrieb: I'm not aware of any, but I see this U+20DD solution mentioned from time to time, as though it were a well-known alternative to encoding things like Warenzeichen or Gesch#tzte Sorte. I see a precedent in Unicode to treat Copyright-like sign differently from simple encircled letters: Unicode takes precautions not to encode the same character twice. Therefore, superscript digits 2 and 3 are absent from the superscript block U+2070 ff. However, the Block eclosed alphanumerics U+2460 ff includes encircled capital latin letters C, P, and R in addition to the copyright-like sing elsewhere. --Jorg Knappen
Saudi-Arabian Copyright sign
Scanning thru some arabian books the following sign attracted my attention: It looks like ARABIC LETTER HAH (isolated form) in a circle. It obviously denotes copyright. It is used consistently in books printed in Saudi-Arabia, but I have never seen it in a book from any other country (including Yemen, UAE, Bahrain, Kuwait, Jordania, Egypt, Libya and Morrocco). Therefore I suggest the name SAUDI-ARABIAN COPYRIGHT SIGN for this one. Since the block for letterlike symbols is already almost full, but there are gaps in the primary arabic block (U+0600-FF), it is IMO well placed there. For a sample, see http://www.uni-mainz.de/~knappen/saudi.gif --Jorg Knappen P.S. Other interesting symbols on my home page: WARENZEICHEN (encircled Wz) http://www.uni-mainz.de/~knappen/fremd_p7.jpg und http://www.uni-mainz.de/~knappen/frem_p17.jpg GESCHUETZTE SORTE (encircled S, like REGISTERED) http://www.uni-mainz.de/~knappen/gp_p159.jpg http://www.uni-mainz.de/~knappen/gp_p159a.jpg http://www.uni-mainz.de/~knappen/asi_p58.jpg Some phonetic symbols with strikethrough: http://www.uni-mainz.de/~knappen/fremd_p8.jpg
RE: Saudi-Arabian Copyright sign
On Sun, 19 Sep 2004, Jon Hanna wrote: For a sample, see http://www.uni-mainz.de/~knappen/saudi.gif Looks like {U+062D, U+20DD} Yes, it does look like that. But it forms a separate entity, just like its precedents COPYRIGHT SIGN or SOUND RECORDING COPYRIGHT SIGN or REGISTERED. GESCHUETZE SORTE is a letterlike symbol of the same kind. --Jorg Knappen
Re: Questions about diacritics
In LaTeX2e with the Cork coding (for TeXnicians: \usepackage[T1]{fontenc}) there is a so-called compound word mark. It has the functions of teh ZERO WIDTH NON JOINER in the UCS: It breaks ligatures, it can be used to produce a final s in the middle of a word. By design, it has zero width but x height. So it can be used to carry accents to be placed in the middle between two characters. My classic for this situation is the german -burg abbreviature often seen in cartography: It is -bg. with breve between b and g. The abbreviature -bg. without accent means -berg. --Jorg Knappen
Re: Public Review Issues Update
40Encoding of Latin Capital and Small Letter At LATIN CAPITAL LETTER AT and LATIN SMALL LETTER AT are used as orthographic characters in the Koalib language of Sudan. Although similar in appearance to COMMERCIAL AT, LATIN SMALL LETTER AT should have different character properties. The main concern is the similarity in appearance of LATIN SMALL LETTER AT to COMMERCIAL AT. There are potential implications for Internet protocols that use @. I have read the short proposal and my answer is YES, of course, the UTC should accept these two characters. Keeping LATIN SMALL LETTER AT and COMMERCIAL AT separate will keep internet protocolls sane. Unifying the two will cause potential damage depending on the locale (guess of @ being capitalized and mapped to something strange ...) --Jorg Knappen
Re: Script l (U+2113)
On Mon, 23 Aug 2004, Kevin Brown wrote: I've just noticed that the script l character (U+2113) is one of only two apparently mandatory characters (the other being estimated U+212E) included in addition to the MacOS Roman character set in a collection of recently released Linotype fonts. Is there any other common usage for U+2113 apart from as the liter/litre symbol that would explain its apparently mandatory inclusion in these fonts? It is used as a mathematical symbol. It started to make the letter l visibly distinct from the digit 1 but has got its own life since than. Also, does this symbol usually occur in only one style/weight, namely italic regular? Or does it also appear in upright regular, upright bold, and italic bold depending on the typographic context? I have never seen anything but italic regular in serious use, but TeX also has a bold italic regular version of it available and because it is easily availble someone will have found a clever use for it. --Jorg Knappen
Re: Mystery of Circled S solved
On Mon, 23 Aug 2004, Anto'nio Martins-Tuva'lkin wrote: It is indentified as a letterlike symbol still missing from UNicode: GESCHUETZTE SORTE looks like: S in a circle U+0053 U+20DD looks very good when set in Code2000. But it isn't GESCHUETZTE SORTE in its specific meaning. Neither is U+24C8. The difference is the same as the difference between U+0052 U+20DD or U+24C7 from U+00AE REGISTERED SIGN. GESCHUETZTE SORTE belongs to a class of special characters with a legal meaning (like COPYRIGHT SIGN and SOUND RECORDING COPYRIGHT SIGN, two name two others of this class). http://www.uni-mainz.de/~knappen/gp_p159.jpg Hm, the bug in the bottom line could be also included -- it would be of great use in computer programming litterature. (Nah, two great alternatives are already encoded: U+2F8D and U+BD81... ;-) Indeed -- but this is another theme. There are about a dozen common gardening symbols used in german publications for decades now and they are worth a proposal. --Jorg Knappen
Mystery of Circled S solved
Dear Unicoders, hallo Barbara I finally solved the mystery of the circled S which has found its way to the AMS math fonts. It is indentified as a letterlike symbol still missing from UNicode: GESCHUETZTE SORTE looks like: S in a circle meaning: A protected crop variety (there is a special protection of crop varieties in germany and now also in EU. In german it is called Sortenschutz and the registration agency is the Bundessortenamt) usage: current, in mail order garden catalogues. It is often used together with the registered sign. In the following links you can see it: http://www.uni-mainz.de/~knappen/asi_p58.jpg (from the catalogue of Ahrens + Sieberz, Spring 2004, page 58) http://www.uni-mainz.de/~knappen/gp_p159.jpg http://www.uni-mainz.de/~knappen/gp_p159a.jpg (from the catalogue of Gaertner Poetschke, Wundervolle Gartenwelt, Autumn 2003) The latter example shows a variety which is both GESCHUETZE SORTE and REGISTERED. For the design, I suggest to use a non-superscript version, following the design of the registered sign. How it came to be included in the AMS fonts is still a mystery, since no mathematical use of it is known to me. Yours, Jorg Knappen
Some letters with strikethrough
To my surprise I saw that LATIN SMALL LETTER TH WITH STRIKETHROUGH is accepted to UNicode. There are a four more letters of the same type used in a popular german phonetic transscription, namely LATIN SMALL LETTER CH WITH STRIKETHROUGH LATIN SMALL LETTER DH WITH STRIKETHROUGH LATIN SMALL LETTER NG WITH STRIKETHROUGH LATIN SMALL LETTER SCH WITH STRIKETHROUGH This list is complete. For a reference see http://www.uni-mainz.de/~knappen/fremd_p8.jpg where this phonetic alphabet is explained. The scan is from Der Kleine Duden Fremdwoerterbuch 3. Auflage 1991 Dudenverlag Mannhein Wien Zuerich Pages 8 and 9 Yours, Jorg Knappen
Warenzeichen
The following letterlike symbol is still missing from UNicode: Warenzeichen looks like Encircled Wz -- the circle is actually an ellipse. Usage: In german dictionaries and lexika it is used to denote words which are registered marks by some owners. This usage is still flowering. I remember that the symbol was also used in advertisements once, but it is replaced by the more fashionable REGISTERED SIGN or TRADEMARK SIGN in this field. The symbol occurs in the private use area of MS Reference Sans Serif and MS Reference Serif in the PUA position U+F7BF. --Jorg Knappen
Looking for transcription or transliteration standards latin-arabic
Are there standards for transscribing or transliterating western languages written in latin to arabic? I am specifically interested in german-arabic, but english-arabic and french-arabic is of interest, too. --Jorg Knappen
Filzlaus
Browsing thro my printed version of UNicode 4.0 I discovered the follwing annotations to U+00A4 (The currency sign): Filzlaus, Ricardi-Sonne (german names) May I request those names to be dropped? Both are jargon at best and not widespread. Filzlaus is meaning crab louse, and it is well known (to german speakers) where those animals live and how they spread. Ricardi-Sonne is probably misspelled (should be Ricardo-Sonne, alluding to the famous economist David Ricardo). The name is not in wide use anyway. Even Sputnik is better known... --Jorg Knappen
Re: Game pieces proposal
Antonio Martins-Tuvalkin schrieb: And the special sign printed on Joker cards (a five pointed star in a circle) Would suitable to use U+235F and/or U+272A? Hmm... U+235F has the right graphical representation, but as a character specific to the APL programming language it is probably unsuitable. U+272A has a different look, it does not fit. --Jorg Knappen
Re: Game pieces proposal
Antonio Martins-Tuvalkin schrieb: Hm, au contraire. Michael's quote above hints precisely that the goal of encoding cards as separate individual characters is to overcome that handicap. Unfortunately, this does not reflect the litterature about gard games. The litterature (and almost every german local daily has a weekly column about Skat) just uses the rather simple notation heatsuitletter K to denote the King of Hearts, if they do not ressort to Herz-K at all. The same is true for web sites about card gaming. Of course one could encode instead a generic play card king character, which Englisg fonts would render K etc, and still have each card as a pair of characters. If the card gamers' community agrees on generic symbols for the ranks of cards (like the chess players have already done, abandoning the letters in chess litterature) they are worth encoding. But UNicode should not try to invent something which does not already exist in the world and has a solid standing there. --Jorg Knappen Who still thinks, that playing cards aren't characters of plain text.
Re: Game pieces proposal
Rethinking about chess: While I think, that the encoding of the chess pieces in UNicode is just right, I wonder about the other symbol used in chess notation, known as informant code system. Many of them may already be there, scattered in the mathematical and technical blocks. But some might be still absent, like WHITE STANDS SLIGHTLY BETTER (glyph looks like plus over equals sign). Wish, there were a stroke index to the mathematical symbols in UNicode. I did such a thing for the symbols in TeX, LaTeX and AMSLATeX, it is published in my book Schnell ans Ziel mit LaTeX2e, Oldenbourg-Verlag, 2nd extended and revised printing 2004. A reference for the Informant code system: ftp://ftp.ctan.de/tex-archive/fonts/chess/chess.zip (You need LaTeX to create the documentation, no ready ps or pdf file is included) --Jorg Knappen
Re: Game pieces proposal
Antonio Martins-Tuvalkin schrieb: chess notation, known as informant code system. ... some might be still absent, like WHITE STANDS SLIGHTLY BETTER (glyph looks like plus over equals sign). What about U+272A PLUS SIGN ABOVE EQUALS SIGN? should read 2A72 Hey, there it is! And BLACK STANDS SLIGHTLY BETTER is on 2A71. --Jorg Knappen. P.S. I found a readily printable version of the chess informator signs on ftp://ftp.dante.de/tex-archive/info/symbols/comprensive (there you can find ps files prepared for a4 paper and letter paper) In my printed version, it is table 200. The whole document contains lots of symbols people found worth doing in LaTeX. Probably raw material for half a dozen proposals.
Re: Game pieces proposal
I think, playing card are not characters (for use in plain text). Usually, one says (in german) Herz-A, Herz-K, Herz-10 or one uses the suit characters already in UNicode heartsuit10 etc. Only the french suits are currently in, one can consider encoding the suits of Mahjongg, german, swiss, italian and spanish card, too. And the special sign printed on Joker cards (a five pointed star in a circle) But not more. For information on card games and cards, you may want to consult http://www.pagat.com or Detlef Hoffamnn: Kultur- und Kunstgeschichte der Spielkarte, Jonas-Verlag, Marburg, 1995. --Jorg Knappen
Re: Game pieces proposal
There's another point about playing cards: The letters for the figures are language-dependent. While english has AKQJ, german has AKDB and other languages still have other letters (all for french style cards here, german suite are still different with DKOU in german). Once one start to encode whole playing cards, one has to do it for all local letters... --Jorg Knappen