Re: s-j combination in Unicode?
On Fri, Feb 15, 2013 at 10:56:17PM -0600, Ben Scarborough wrote: On Feb 16, 2013 02:13, Andries Brouwer wrote: The fragment of text I showed was not from dialectology, but just from a novel written in Elfdalian. The symbols are meant to be those of ordinary orthography. Does that mean there's also a capital S-J? Probably, in entirely capitalized text. At sentence start I see capitalized I-ogonek, O-ogonek, U-ogonek, Å-ogonek in ordinary text. I have only seen the s-j following d or t, not word-initially. Andries
Re: s-j combination in Unicode?
On 2/15/2013 11:59 PM, Andries Brouwer wrote: On Fri, Feb 15, 2013 at 10:56:17PM -0600, Ben Scarborough wrote: On Feb 16, 2013 02:13, Andries Brouwer wrote: The fragment of text I showed was not from dialectology, but just from a novel written in Elfdalian. The symbols are meant to be those of ordinary orthography. Does that mean there's also a capital S-J? Probably, in entirely capitalized text. At sentence start I see capitalized I-ogonek, O-ogonek, U-ogonek, Å-ogonek in ordinary text. I have only seen the s-j following d or t, not word-initially. Andries That would make it analogous in a way to German ß. The minute things show up in real orthographies the pressure to handle ALL CAPS exists. The wider use an orhography has, the stronger that pressure is, of course. A./
Re: s-j combination in Unicode?
That would make it analogous in a way to German ß. The minute things show up in real orthographies the pressure to handle ALL CAPS exists. The question then is whether you'll find SJ or overlaid S/J. Or how a Swede would instinctively handle this, in the absence of an example of a consistently applied rule. (By the way, for those finding the German rule to write SS unsatisfactory: It's hard to come by an actual minimal pair. And it's not like capitalization is otherwise invertible – the capitalization bits contain information as well, after all.) Stephan
Re: s-j combination in Unicode?
2013-02-16 11:38, Stephan Stiller wrote: (By the way, for those finding the German rule to write SS unsatisfactory: It's hard to come by an actual minimal pair. Example: Strauss vs. Strauß. Originally the same name, but two spellings make them two names that may need to be distinguished from each other. And it's not like capitalization is otherwise invertible – the capitalization bits contain information as well, after all.) That is correct in general. But for German personal names, I would expect capitalization to be invertible, provided that “ß” has been mapped to “ẞ” U+1E9E LATIN CAPITAL LETTER SHARP S. Yucca
Re: s-j combination in Unicode?
[...] an actual minimal pair. Example: Strauss vs. Strauß. Originally the same name, but two spellings make them two names that may need to be distinguished from each other. True for Wei{ß/ss} as well. Or a non-name example: Buße (repentance) vs Busse (buses). But then, non-name examples are far less likely to remain ambiguous in context. [...] it's not like capitalization is otherwise invertible [...] [...] for German personal names, I would expect capitalization to be invertible, provided that “ß” has been mapped to “ẞ” U+1E9E LATIN CAPITAL LETTER SHARP S. Yes, that would be better. Presently, some official documents retain (lowercase) ß within all-caps writing in some places where it really matters, but it's rare to see such a style elsewhere, and it's technically not permitted by our official orthography, fwiw. Which amounts to a weird situation, because during the debates surrounding German orthographic reform 1-2 decades ago, one argument presented to those who were against was that the official body of rules (amtliche Regelung) was binding only for schoolchildren and civil servants within government offices anyways :-) (See also this German supreme court decision http://www.bverfg.de/pressemitteilungen/bvg06-042.html.) Stephan
German »ß« (was: s-j combination in Unicode?)
Hello, Am 16.02.2013 11:48, schrieb Stephan Stiller: Or a non-name example: Buße (repentance) vs Busse (buses). But then, non-name examples are far less likely to remain ambiguous in context. Years ago, I have seen with my own eyes, in a Swiss magazine (where they consistently replace “ß” with “ss”), the following amusing example: … Brigitte Bardot mit ihren beachtlichen Körpermassen … which translates to: “BB, and her considerable bodyly masses”, whilst the author probably wanted to say: “BB, and her remarkable physical measurements (=body shape)”. During the discussion on the German spelling reform, in the 1990s, the same minimal pair has been used in the following context: Es ist ein Unterschied, ob ich Bier in Maßen trinke oder in Massen. meaning: “It makes a difference, whether I drink beer in moderation, or in masses”. Minimal pairs for “ß” vs. “ss”, not involving proper names, are extremly rare; in fact, I only know the two mentioned in this very note. Between ordinary words and proper names (or place names), you can, of course, find more minimal pairs, e. g., “Füßen” (a declension form of “Fuß” = foot) and “Füssen” (a town in Bavaria). Cheers, Otto
Re: s-j combination in Unicode?
On Sat, Feb 16, 2013 at 12:22:08AM -0800, Asmus Freytag wrote: On 2/15/2013 11:59 PM, Andries Brouwer wrote: On Fri, Feb 15, 2013 at 10:56:17PM -0600, Ben Scarborough wrote: Does that mean there's also a capital S-J? Probably, in entirely capitalized text. At sentence start I see capitalized I-ogonek, O-ogonek, U-ogonek, Å-ogonek in ordinary text. I have only seen the s-j following d or t, not word-initially. That would make it analogous in a way to German ß. The minute things show up in real orthographies the pressure to handle ALL CAPS exists. I found Diauni.ttf at http://www.thesauruslex.com/typo/dialekt.htm (swedish) http://www.thesauruslex.com/typo/engdial.htm (english) It has landmålsalfabetet at E100-E197 (lower case only) and s-j at E19F, S-J at E1A5, with Y-ogonek, Å-ogonek, G-slash, R-slash, Ð-slash nearby. Andries [BTW Is the fact that o-slash is not decomposed not entirely analogous to the fact that i is not decomposed? I would say that neither gives an indication of how symbols involving a combining dot or combining slash are handled in general.]
Re: German »ß«
Or a non-name example: Buße (repentance) vs Busse (buses). But then, non-name examples are far less likely to remain ambiguous in context. A reason why Jukka's original example – like most proper name examples – was better than mine is that it's truly minimal in that context will really not help /and/ the pronunciation is identical. I wrote most because there's gotta be variation in the pronunciation of a surname like Rößler. Note though that those names would, if invented today, all get an unambiguous spelling with either ss (preceding short vowel) or ß (preceding long vowel or diphthong); you have a lot of unusual_{from today's point of view} letter doubling in names. Minimal pairs for “ß” vs. “ss”, not involving proper names, are extremly rare; in fact, I only know the two mentioned in this very note. Between ordinary words and proper names (or place names), you can, of course, find more minimal pairs, e. g., “Füßen” (a declension form of “Fuß” = foot) and “Füssen” (a town in Bavaria). Interesting would be a non-name minimal pair whose components have identical pronunciation. Perhaps none is in ordinary use (otherwise we'd know?), though one can come up with compound words like Maißtier/Maisstier. Not that they make much sense, plus this one actually fails the pronunciation test. But some construction along those lines ought to be possible (and easy, with a corpus). And there probably /is/ an uncontrived example if we allow a pairing with a geographical name, similar to what you're mentioning. Of course in my worldview, all-caps writing is deprecated :-) I might have never written a ß-SS outside of this exchange. Which isn't to say that I don't laud efforts at precision, at popularizing a capital ẞ, or at orthographic reform in general :-) Stephan
Re: s-j combination in Unicode?
On 2/16/2013 1:38 AM, Stephan Stiller wrote: That would make it analogous in a way to German ß. The minute things show up in real orthographies the pressure to handle ALL CAPS exists. The question then is whether you'll find SJ or overlaid S/J. Or how a Swede would instinctively handle this, in the absence of an example of a consistently applied rule. There's a question firts, of whether there's a difference between s+j and simple sj. Is it just to mark a different pronunciation of what would be sj in standard Swedish, or are these contrasting in Elfdlalien as well. I suspect that the fallback would be SJ, if nothing else is available, but currently, anybody using s+j would use private fonts and thus there's not necessarily a need to use a fallback. This is different from German use where telegraphs and typewriters were instrumental in creating and cementing the need for a fallback. The German-style fallback is painful enough as it is to make sure it's not Unicode creating the bottleneck. (By the way, for those finding the German rule to write SS unsatisfactory: It's hard to come by an actual minimal pair. MASSE - mass or measurements? See, not hard at all. With the new orthography, ss vs. ß affects the pronunciation of the preceding vowel. It's irritating to see SS because you have to override that rule when you know that the word in lowercase was pronounced differently. And, as Andreas had painstakenly done, you can collect a nearly infinite array of examples where users, in rule-bound Germany(!), simply continue to ignore that rule. A./ PS: And it's not like capitalization is otherwise invertible – the capitalization bits contain information as well, after all.) Besides the point a bit. Even thought it's true that mixed case carries information that's lost in all upper or all lowercase, the issue is a bit different, as not focused on one letter..
Re: s-j combination in Unicode?
On 2/16/2013 7:04 AM, Andries Brouwer wrote: [BTW Is the fact that o-slash is not decomposed not entirely analogous to the fact that i is not decomposed? I would say that neither gives an indication of how symbols involving a combining dot or combining slash are handled in general.] Why don't you just take the precedent as what it is and make your proposal accordingly. Some decisions that went into Unicode could have come out different perhaps, but history says the didn't, and we are stuck with them. Changing horses in mid-stream helps nobody. A./
Re: s-j combination in Unicode?
On 2/16/2013 7:04 AM, Andries Brouwer wrote: I found Diauni.ttf at http://www.thesauruslex.com/typo/dialekt.htm (swedish) http://www.thesauruslex.com/typo/engdial.htm (english) It has landmålsalfabetet at E100-E197 (lower case only) and s-j at E19F, S-J at E1A5, with Y-ogonek, Å-ogonek, G-slash, R-slash, Ð-slash nearby. So you have evidence that the uppercase form is implemented, if not yet a citation of actual use. Since the latter is expected to be rare, I personally would be comfortable with making a code point for it, so that fonts like this, which are actually used, can be mapped to Unicode w/o forcing people into weird fallbacks over a rare character. A./
Re: s-j combination in Unicode?
It's hard to come by an actual minimal pair. MASSE - mass or measurements? See, not hard at all. [and] With the new orthography, ss vs. ß affects the pronunciation of the preceding vowel. It's irritating to see SS because you have to override that rule when you know that the word in lowercase was pronounced differently. Well, this pretty much summarizes why I think SS-for-ß looks distracting. So since I very much agree with such sentiment, I should probably not have given a mild defense of this practice in the first place. But where are all those other examples? Now – aside from come by meaning come across :-) (which is, in all fairness, not what I meant earlier), let's now ask how frequent this is, really. I don't encounter that much all-caps text in the first place (which to me looks stupid, independently), and MASSE is variant #3 in this thread of the double example that Otto Scholz just gave (Körpermassen (Switzerland), IN MASSEN), obviously terribly likely to appear in an all-caps context. Remind me real quick, I must have forgotten about all those popular, bestselling all-caps physics books teaching about mass and measurements – the comparative discussion of beer and female bodies was probably in the appendix about SI units :-) which I must have skipped. And it's not like capitalization is otherwise invertible – the capitalization bits contain information as well, after all.) Besides the point a bit. Even thought it's true that mixed case carries information that's lost in all upper or all lowercase, the issue is a bit different, as not focused on one letter. Text being all-caps is a property applied to the word level (for emphasis) or to the paragraph level. The minimal unit it applies to is (normally) the word. (@normally: What to do with word-internal capital letters, as eg in certain Gaelic names is another question.) You're right to point this out, but SS-as-capital-ß really only occurs in an all-caps writing context, which has the capitalization property applied to entire words. Stephan
Re: s-j combination in Unicode?
the issue is a bit different, as not focused on one letter While we're splitting hairs: Word- or larger-level all-caps /does/ normally make a one-letter difference. When we undo all-caps, one can /normally/ lowercase everything of the word except the first letter. The capitalization bit of that one letter is sometimes unclear. S
Re: s-j combination in Unicode?
On 2/16/2013 10:48 AM, Stephan Stiller wrote: the issue is a bit different, as not focused on one letter While we're splitting hairs: Word- or larger-level all-caps /does/ normally make a one-letter difference. When we undo all-caps, one can /normally/ lowercase everything of the word except the first letter. The capitalization bit of that one letter is sometimes unclear. And usually not totally sense-destroying to a human reader with context available. But these fallbacks allow clear misspelled words to appear, not just miscapitalized ones. That's huge. A./
Re: German »ß«
2013/2/16 Stephan Stiller stephan.stil...@gmail.com: Of course in my worldview, all-caps writing is deprecated :-) This is a presentation style which makes words more readable in some conditions, notably on plates displayed on roads (cities are extremely rarely written in lowercase, as this is more difficult to read from far away when driving). Capitals anyway do not exclude preserving distinctions (so there's a capital Ess-Tsett which preserves the distinction with SS, anc accents are still present, even if they are difficult to distinguish from far away on roads) Deprecation only concerns long texts, presented in multiline paragraphs, for which capitals make the text less easy to read.
Re: German »ß«
On 2013-02-16, Philippe Verdy verd...@wanadoo.fr wrote: 2013/2/16 Stephan Stiller stephan.stil...@gmail.com: Of course in my worldview, all-caps writing is deprecated :-) This is a presentation style which makes words more readable in some conditions, notably on plates displayed on roads (cities are extremely rarely written in lowercase, as this is more difficult to read from far away when driving). Half a century ago, the UK, after extensive empirical testing, mandated mixed case for road signs because it is significantly easier to read at speed. Our cousins across the Atlantic have finally caught on, and the U.S. Federal Highway Administration now mandates mixed case for place names, while leaving fixed wording in all caps. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: s-j combination in Unicode?
On 2/16/2013 10:48 AM, Stephan Stiller wrote: the issue is a bit different, as not focused on one letter While we're splitting hairs: Word- or larger-level all-caps /does/ normally make a one-letter difference. When we undo all-caps, one can /normally/ lowercase everything of the word except the first letter. The capitalization bit of that one letter is sometimes unclear. Sorry, not what I meant. It can hit any letter of the alphabet. The ß issue hits only one specific letter. A./
Re: German »ß«
Another solution is also used: Capitals written as Big capitals, and lowercase written as small capitals (i.e. just a minor font size reduction). True lowercase letters are causing problems on road sign indicators on roads with high speed : they are hard to read and if the driver has to look at them for one more second, he does not look at the road. There's a security concern and it's not a minor problem. Font styles are also studied to use the simplest glyphs without any extra decoration which would distract the driver, especially on dynamic displays. For static displays, only legal forms and colors are admitted, so that these displays don't need to be completely deciphered and are immediately recognized. It is then safer to just show what is essential. And on dynamic display indicators (whose content is displayed but changes according to current conditions), there are laws that prohibit using anything else then just big capitals, and prohibit any soft of color enhancement or decoration, and require strong contrast, i.e. either white or yellow on black or dark color, or black on white or yellow ; the red color may be used on the icon to signal a danger). All text effects are prohibited (including italics, underlining, boldness, narrowing or widening). But some font sizes adjustments are possible for less essential information. There are also required icons for signaling dangers, but these icons must also be folllowed by what they mean, i.e. DANGER which cannot be smaller or larger than thre essential message. E.g. for signaling dangers of wind, the DANGER icon is diaplyed followed by DANGER : VENT VIOLENT (violent wing), but the indication of the effective speed of wind (in km/h) being less essential may be smaller. But it still has to avoid all sorts of decorations, and letters remain in capitals (e.g. RAFALES À 80 KM/H instead of Rafales à 80 km/h). Less essential information like the time to reach a destination, or the length of traffic jams is less essential than the distance where the traffic jam is expected to occur after the indicator. These are all cases of very short texts, without true sentences, they are expected to be read very fast and understood immediately without distracting the driver. They are not advertisements... Only local place names may be lowercased (no minor roads or within cities to give names of streets or names of touristic points of interest, or the direction of some local services), but not on high speed roads or motorways (for example to signal exits on motorways or high speed roads, or the lane to keep to follow a direction before a branch). 2013/2/16 Julian Bradfield jcb+unic...@inf.ed.ac.uk: On 2013-02-16, Philippe Verdy verd...@wanadoo.fr wrote: 2013/2/16 Stephan Stiller stephan.stil...@gmail.com: Of course in my worldview, all-caps writing is deprecated :-) This is a presentation style which makes words more readable in some conditions, notably on plates displayed on roads (cities are extremely rarely written in lowercase, as this is more difficult to read from far away when driving). Half a century ago, the UK, after extensive empirical testing, mandated mixed case for road signs because it is significantly easier to read at speed. Our cousins across the Atlantic have finally caught on, and the U.S. Federal Highway Administration now mandates mixed case for place names, while leaving fixed wording in all caps.
Re: German »ß«
On 2/16/2013 12:06 PM, Philippe Verdy wrote: 2013/2/16 Stephan Stiller stephan.stil...@gmail.com: Of course in my worldview, all-caps writing is deprecated :-) This is a presentation style which makes words more readable in some conditions, notably on plates displayed on roads (cities are extremely rarely written in lowercase, as this is more difficult to read from far away when driving). Capitals anyway do not exclude preserving distinctions (so there's a capital Ess-Tsett which preserves the distinction with SS, anc accents are still present, even if they are difficult to distinguish from far away on roads) This may be a French thing. A./ For US, see discussion here: http://www.studio360.org/2011/jan/21/design-real-world/ For Germany, look at http://www.ace-online.de/fileadmin/user_uploads/Der_Club/Presse-Archiv/Bilder/Verkehr/Autobahn/Autobahn_01.jpg or google Autobahnschilder for more PPS: Sweden has quite a bit of UPPERCASE, but seems to use mixed case for some purposes (such as legends on warning signs and minor destinations on road signs). Deprecation only concerns long texts, presented in multiline paragraphs, for which capitals make the text less easy to read.
Re: s-j combination in Unicode?
from earlier: Otto Scholz Oops, sorry. Otto Stolz. And usually not totally sense-destroying to a human reader with context available. But these fallbacks allow clear misspelled words to appear, not just miscapitalized ones. That's huge. I'm all for a capital version of ß and other such letters, but you may be talking in extremes too much. As far as real ambiguities are introduced, the loss of capitalization on the first letter introduces far more, impressionistically speaking, and they might be legally subtle; though those very sporadically occurring ones coming from SS are more likely to be totally sense-destroying, yes. What I'm also saying is that it's a minor issue compared to the destruction of readability by usage of all-caps in the first place. I'd rather focus on avoiding ambiguity in written language otherwise. Those concerning syntactic structure are troublesome, for example. S
Re: s-j combination in Unicode?
On 2/16/2013 9:55 PM, Stephan Stiller wrote: from earlier: Otto Scholz Oops, sorry. Otto Stolz. And usually not totally sense-destroying to a human reader with context available. But these fallbacks allow clear misspelled words to appear, not just miscapitalized ones. That's huge. I'm all for a capital version of ß and other such letters, but you may be talking in extremes too much. Never! ;) Actually, the question that started this particular discussion is most likely moot, because the fact that Andries has located at the minimum an existing font implementation of capital S+J. That seems to indicate that, again at the bare minimum, there are other people who think that SJ is not the way to render this. A./
Re: German »ß«
On 2013-02-17, Philippe Verdy verd...@wanadoo.fr wrote: True lowercase letters are causing problems on road sign indicators on roads with high speed : they are hard to read and if the driver has to look at them for one more second, he does not look at the road. AS I SAID, empirical evaluation by those who had good cause to care about the issue indicates the opposite, that people take longer to read all caps (as is also the case in normal text). This evaluation was done specifically for high speed roads. It included live testing on one motorway. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.