Re: s-j combination in Unicode?

2013-02-16 Thread Andries Brouwer
On Fri, Feb 15, 2013 at 10:56:17PM -0600, Ben Scarborough wrote:
 On Feb 16, 2013 02:13, Andries Brouwer wrote:
  The fragment of text I showed
  was not from dialectology, but just from a novel written in Elfdalian.
  The symbols are meant to be those of ordinary orthography.
 
 Does that mean there's also a capital S-J?

Probably, in entirely capitalized text. At sentence start I see
capitalized I-ogonek, O-ogonek, U-ogonek, Å-ogonek in ordinary text.
I have only seen the s-j following d or t, not word-initially.

Andries



Re: s-j combination in Unicode?

2013-02-16 Thread Asmus Freytag

On 2/15/2013 11:59 PM, Andries Brouwer wrote:

On Fri, Feb 15, 2013 at 10:56:17PM -0600, Ben Scarborough wrote:

On Feb 16, 2013 02:13, Andries Brouwer wrote:

The fragment of text I showed
was not from dialectology, but just from a novel written in Elfdalian.
The symbols are meant to be those of ordinary orthography.

Does that mean there's also a capital S-J?

Probably, in entirely capitalized text. At sentence start I see
capitalized I-ogonek, O-ogonek, U-ogonek, Å-ogonek in ordinary text.
I have only seen the s-j following d or t, not word-initially.

Andries



That would make it analogous in a way to German ß.

The minute things show up in real orthographies the pressure to handle 
ALL CAPS exists.


The wider use an orhography has, the stronger that pressure is, of course.

A./


Re: s-j combination in Unicode?

2013-02-16 Thread Stephan Stiller



That would make it analogous in a way to German ß.

The minute things show up in real orthographies the pressure to handle 
ALL CAPS exists.


The question then is whether you'll find SJ or overlaid S/J. Or 
how a Swede would instinctively handle this, in the absence of an 
example of a consistently applied rule.


(By the way, for those finding the German rule to write SS 
unsatisfactory: It's hard to come by an actual minimal pair. And it's 
not like capitalization is otherwise invertible – the capitalization 
bits contain information as well, after all.)


Stephan



Re: s-j combination in Unicode?

2013-02-16 Thread Jukka K. Korpela

2013-02-16 11:38, Stephan Stiller wrote:


(By the way, for those finding the German rule to write SS
unsatisfactory: It's hard to come by an actual minimal pair.


Example: Strauss vs. Strauß. Originally the same name, but two spellings 
make them two names that may need to be distinguished from each other.



 And it's
not like capitalization is otherwise invertible – the capitalization
bits contain information as well, after all.)


That is correct in general. But for German personal names, I would 
expect capitalization to be invertible, provided that “ß” has been 
mapped to “ẞ” U+1E9E LATIN CAPITAL LETTER SHARP S.


Yucca






Re: s-j combination in Unicode?

2013-02-16 Thread Stephan Stiller



[...] an actual minimal pair.


Example: Strauss vs. Strauß. Originally the same name, but two 
spellings make them two names that may need to be distinguished from 
each other.


True for Wei{ß/ss} as well. Or a non-name example: Buße (repentance) 
vs Busse (buses). But then, non-name examples are far less likely to 
remain ambiguous in context.



[...] it's not like capitalization is otherwise invertible [...]


[...] for German personal names, I would expect capitalization to be 
invertible, provided that “ß” has been mapped to “ẞ” U+1E9E LATIN 
CAPITAL LETTER SHARP S.


Yes, that would be better. Presently, some official documents retain 
(lowercase) ß within all-caps writing in some places where it really 
matters, but it's rare to see such a style elsewhere, and it's 
technically not permitted by our official orthography, fwiw.


Which amounts to a weird situation, because during the debates 
surrounding German orthographic reform 1-2 decades ago, one argument 
presented to those who were against was that the official body of rules 
(amtliche Regelung) was binding only for schoolchildren and civil 
servants within government offices anyways :-) (See also this German 
supreme court decision 
http://www.bverfg.de/pressemitteilungen/bvg06-042.html.)


Stephan



German »ß« (was: s-j combination in Unicode?)

2013-02-16 Thread Otto Stolz

Hello,

Am 16.02.2013 11:48, schrieb Stephan Stiller:

Or a non-name example: Buße (repentance)
vs Busse (buses). But then, non-name examples are far less likely to
remain ambiguous in context.


Years ago, I have seen with my own eyes, in a Swiss magazine
(where they consistently replace “ß” with “ss”), the following
amusing example:
  … Brigitte Bardot mit ihren beachtlichen Körpermassen …
which translates to: “BB, and her considerable bodyly masses”,
whilst the author probably wanted to say: “BB, and her
remarkable physical measurements (=body shape)”.

During the discussion on the German spelling reform, in the 1990s,
the same minimal pair has been used in the following context:
  Es ist ein Unterschied, ob ich Bier in Maßen trinke oder in Massen.
meaning: “It makes a difference, whether I drink beer in moderation,
or in masses”.

Minimal pairs for “ß” vs. “ss”, not involving proper names,
are extremly rare; in fact, I only know the two mentioned
in this very note. Between ordinary words and proper names
(or place names), you can, of course, find more minimal pairs,
e. g., “Füßen” (a declension form of “Fuß” = foot) and “Füssen”
(a town in Bavaria).

Cheers,
  Otto







Re: s-j combination in Unicode?

2013-02-16 Thread Andries Brouwer
On Sat, Feb 16, 2013 at 12:22:08AM -0800, Asmus Freytag wrote:
 On 2/15/2013 11:59 PM, Andries Brouwer wrote:
 On Fri, Feb 15, 2013 at 10:56:17PM -0600, Ben Scarborough wrote:

 Does that mean there's also a capital S-J?

 Probably, in entirely capitalized text. At sentence start I see
 capitalized I-ogonek, O-ogonek, U-ogonek, Å-ogonek in ordinary text.
 I have only seen the s-j following d or t, not word-initially.
 
 That would make it analogous in a way to German ß.
 The minute things show up in real orthographies the pressure to
 handle ALL CAPS exists.

I found Diauni.ttf at
http://www.thesauruslex.com/typo/dialekt.htm (swedish)
http://www.thesauruslex.com/typo/engdial.htm (english)

It has landmålsalfabetet at E100-E197 (lower case only)
and s-j at E19F, S-J at E1A5, with Y-ogonek, Å-ogonek,
G-slash, R-slash, Ð-slash nearby.

Andries


[BTW Is the fact that o-slash is not decomposed not entirely
analogous to the fact that i is not decomposed? I would say
that neither gives an indication of how symbols involving
a combining dot or combining slash are handled in general.]



Re: German »ß«

2013-02-16 Thread Stephan Stiller



Or a non-name example: Buße (repentance)
vs Busse (buses). But then, non-name examples are far less likely to
remain ambiguous in context.
A reason why Jukka's original example – like most proper name examples – 
was better than mine is that it's truly minimal in that context will 
really not help /and/ the pronunciation is identical. I wrote most 
because there's gotta be variation in the pronunciation of a surname 
like Rößler. Note though that those names would, if invented today, all 
get an unambiguous spelling with either ss (preceding short vowel) or 
ß (preceding long vowel or diphthong); you have a lot of unusual_{from 
today's point of view} letter doubling in names.



Minimal pairs for “ß” vs. “ss”, not involving proper names,
are extremly rare; in fact, I only know the two mentioned
in this very note. Between ordinary words and proper names
(or place names), you can, of course, find more minimal pairs,
e. g., “Füßen” (a declension form of “Fuß” = foot) and “Füssen”
(a town in Bavaria).


Interesting would be a non-name minimal pair whose components have 
identical pronunciation. Perhaps none is in ordinary use (otherwise we'd 
know?), though one can come up with compound words like 
Maißtier/Maisstier. Not that they make much sense, plus this one 
actually fails the pronunciation test. But some construction along those 
lines ought to be possible (and easy, with a corpus). And there probably 
/is/ an uncontrived example if we allow a pairing with a geographical 
name, similar to what you're mentioning.


Of course in my worldview, all-caps writing is deprecated :-) I might 
have never written a ß-SS outside of this exchange. Which isn't to say 
that I don't laud efforts at precision, at popularizing a capital ẞ, or 
at orthographic reform in general :-)


Stephan



Re: s-j combination in Unicode?

2013-02-16 Thread Asmus Freytag

On 2/16/2013 1:38 AM, Stephan Stiller wrote:



That would make it analogous in a way to German ß.

The minute things show up in real orthographies the pressure to 
handle ALL CAPS exists.


The question then is whether you'll find SJ or overlaid S/J. Or 
how a Swede would instinctively handle this, in the absence of an 
example of a consistently applied rule.


There's a question firts, of whether there's a difference between s+j 
and simple sj. Is it just to mark a different pronunciation of what 
would be sj in standard Swedish, or are these contrasting in 
Elfdlalien as well.


I suspect that the fallback would be SJ, if nothing else is available, 
but currently, anybody using s+j would use private fonts and thus 
there's not necessarily a need to use a fallback.


This is different from German use where telegraphs and typewriters were 
instrumental in creating and cementing the need for a fallback.


The German-style fallback is painful enough as it is to make sure it's 
not Unicode creating the bottleneck.




(By the way, for those finding the German rule to write SS 
unsatisfactory: It's hard to come by an actual minimal pair. 


MASSE - mass or measurements? See, not hard at all.

With the new orthography, ss vs. ß affects the pronunciation of the 
preceding vowel. It's irritating to see SS because you have to 
override that rule when you know that the word in lowercase was 
pronounced differently.


And, as Andreas had painstakenly done, you can collect a nearly infinite 
array of examples where users, in rule-bound Germany(!), simply continue 
to ignore that rule.


A./

PS:
And it's not like capitalization is otherwise invertible – the 
capitalization bits contain information as well, after all.)


Besides the point a bit. Even thought it's true that mixed case carries 
information that's lost in all upper or all lowercase, the issue is a 
bit different, as not focused on one letter..





Re: s-j combination in Unicode?

2013-02-16 Thread Asmus Freytag

On 2/16/2013 7:04 AM, Andries Brouwer wrote:

[BTW Is the fact that o-slash is not decomposed not entirely
analogous to the fact that i is not decomposed? I would say
that neither gives an indication of how symbols involving
a combining dot or combining slash are handled in general.]


Why don't you just take the precedent as what it is and make your 
proposal accordingly. Some decisions that went into Unicode could have 
come out different perhaps, but history says the didn't, and we are 
stuck with them. Changing horses in mid-stream helps nobody.


A./


Re: s-j combination in Unicode?

2013-02-16 Thread Asmus Freytag

On 2/16/2013 7:04 AM, Andries Brouwer wrote:

I found Diauni.ttf at
http://www.thesauruslex.com/typo/dialekt.htm  (swedish)
http://www.thesauruslex.com/typo/engdial.htm  (english)

It has landmålsalfabetet at E100-E197 (lower case only)
and s-j at E19F, S-J at E1A5, with Y-ogonek, Å-ogonek,
G-slash, R-slash, Ð-slash nearby.
So you have evidence that the uppercase form is implemented, if not yet 
a citation of actual use.


Since the latter is expected to be rare, I personally would be 
comfortable with making a code point for it, so that fonts like this, 
which are actually used, can be mapped to Unicode w/o forcing people 
into weird fallbacks over a rare character.


A./


Re: s-j combination in Unicode?

2013-02-16 Thread Stephan Stiller


It's hard to come by an actual minimal pair. 


MASSE - mass or measurements? See, not hard at all.

[and]
With the new orthography, ss vs. ß affects the pronunciation of 
the preceding vowel. It's irritating to see SS because you have to 
override that rule when you know that the word in lowercase was 
pronounced differently.


Well, this pretty much summarizes why I think SS-for-ß looks 
distracting. So since I very much agree with such sentiment, I should 
probably not have given a mild defense of this practice in the first place.


But where are all those other examples? Now – aside from come by 
meaning come across :-) (which is, in all fairness, not what I meant 
earlier), let's now ask how frequent this is, really. I don't encounter 
that much all-caps text in the first place (which to me looks stupid, 
independently), and MASSE is variant #3 in this thread of the double 
example that Otto Scholz just gave (Körpermassen (Switzerland), IN 
MASSEN), obviously terribly likely to appear in an all-caps context. 
Remind me real quick, I must have forgotten about all those popular, 
bestselling all-caps physics books teaching about mass and measurements 
– the comparative discussion of beer and female bodies was probably in 
the appendix about SI units :-) which I must have skipped.


And it's not like capitalization is otherwise invertible – the 
capitalization bits contain information as well, after all.)
Besides the point a bit. Even thought it's true that mixed case 
carries information that's lost in all upper or all lowercase, the 
issue is a bit different, as not focused on one letter.


Text being all-caps is a property applied to the word level (for 
emphasis) or to the paragraph level. The minimal unit it applies to is 
(normally) the word. (@normally: What to do with word-internal capital 
letters, as eg in certain Gaelic names is another question.) You're 
right to point this out, but SS-as-capital-ß really only occurs in an 
all-caps writing context, which has the capitalization property applied 
to entire words.


Stephan




Re: s-j combination in Unicode?

2013-02-16 Thread Stephan Stiller



the issue is a bit different, as not focused on one letter
While we're splitting hairs: Word- or larger-level all-caps /does/ 
normally make a one-letter difference. When we undo all-caps, one can 
/normally/ lowercase everything of the word except the first letter. The 
capitalization bit of that one letter is sometimes unclear.


S



Re: s-j combination in Unicode?

2013-02-16 Thread Asmus Freytag

On 2/16/2013 10:48 AM, Stephan Stiller wrote:



the issue is a bit different, as not focused on one letter
While we're splitting hairs: Word- or larger-level all-caps /does/ 
normally make a one-letter difference. When we undo all-caps, one can 
/normally/ lowercase everything of the word except the first letter. 
The capitalization bit of that one letter is sometimes unclear.


And usually not totally sense-destroying to a human reader with context 
available. But these fallbacks allow clear misspelled words to appear, 
not just miscapitalized ones. That's huge.


A./





Re: German »ß«

2013-02-16 Thread Philippe Verdy
2013/2/16 Stephan Stiller stephan.stil...@gmail.com:
 Of course in my worldview, all-caps writing is deprecated :-)

This is a presentation style which makes words more readable in some
conditions, notably on plates displayed on roads (cities are extremely
rarely written in lowercase, as this is more difficult to read from
far away when driving). Capitals anyway do not exclude preserving
distinctions (so there's a capital Ess-Tsett which preserves the
distinction with SS, anc accents are still present, even if they are
difficult to distinguish from far away on roads)

Deprecation only concerns long texts, presented in multiline
paragraphs, for which capitals make the text less easy to read.



Re: German »ß«

2013-02-16 Thread Julian Bradfield
On 2013-02-16, Philippe Verdy verd...@wanadoo.fr wrote:
 2013/2/16 Stephan Stiller stephan.stil...@gmail.com:
 Of course in my worldview, all-caps writing is deprecated :-)

 This is a presentation style which makes words more readable in some
 conditions, notably on plates displayed on roads (cities are extremely
 rarely written in lowercase, as this is more difficult to read from
 far away when driving). 

Half a century ago, the UK, after extensive empirical testing,
mandated mixed case for road signs because it is significantly easier
to read at speed.
Our cousins across the Atlantic have finally caught on, and 
the U.S. Federal Highway Administration now mandates mixed case for
place names, while leaving fixed wording in all caps.



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




Re: s-j combination in Unicode?

2013-02-16 Thread Asmus Freytag

On 2/16/2013 10:48 AM, Stephan Stiller wrote:



the issue is a bit different, as not focused on one letter
While we're splitting hairs: Word- or larger-level all-caps /does/ 
normally make a one-letter difference. When we undo all-caps, one can 
/normally/ lowercase everything of the word except the first letter. 
The capitalization bit of that one letter is sometimes unclear.


Sorry, not what I meant. It can hit any letter of the alphabet. The ß 
issue hits only one specific letter.


A./





Re: German »ß«

2013-02-16 Thread Philippe Verdy
Another solution is also used: Capitals written as Big capitals, and
lowercase written as small capitals (i.e. just a minor font size
reduction).
True lowercase letters are causing problems on road sign indicators on
roads with high speed : they are hard to read and if the driver has to
look at them for one more second, he does not look at the road.
There's a security concern and it's not a minor problem. Font styles
are also studied to use the simplest glyphs without any extra
decoration which would distract the driver, especially on dynamic
displays. For static displays, only legal forms and colors are
admitted, so that these displays don't need to be completely
deciphered and are immediately recognized. It is then safer to just
show what is essential.

And on dynamic display indicators (whose content is displayed but
changes according to current conditions), there are laws that prohibit
using anything else then just big capitals, and prohibit any soft of
color enhancement or decoration, and require strong contrast, i.e.
either white or yellow on black or dark color, or black on white or
yellow ; the red color may be used on the icon to signal a danger).
All text effects are prohibited (including italics, underlining,
boldness, narrowing or widening). But some font sizes adjustments are
possible for less essential information. There are also required icons
for signaling dangers, but these icons must also be folllowed by what
they mean, i.e. DANGER which cannot be smaller or larger than thre
essential message.
E.g. for signaling dangers of wind, the DANGER icon is diaplyed
followed by DANGER : VENT VIOLENT (violent wing), but the indication
of the effective speed of wind (in km/h) being less essential may be
smaller. But it still has to avoid all sorts of decorations, and
letters remain in capitals (e.g. RAFALES À 80 KM/H instead of
Rafales à 80 km/h).
Less essential information like the time to reach a destination, or
the length of traffic jams is less essential than the distance where
the traffic jam is expected to occur after the indicator.
These are all cases of very short texts, without true sentences, they
are expected to be read very fast and understood immediately without
distracting the driver. They are not advertisements...

Only local place names may be lowercased (no minor roads or within
cities to give names of streets or names of touristic points of
interest, or the direction of some local services), but not on high
speed roads or motorways (for example to signal exits on motorways or
high speed roads, or the lane to keep to follow a direction before a
branch).

2013/2/16 Julian Bradfield jcb+unic...@inf.ed.ac.uk:
 On 2013-02-16, Philippe Verdy verd...@wanadoo.fr wrote:
 2013/2/16 Stephan Stiller stephan.stil...@gmail.com:
 Of course in my worldview, all-caps writing is deprecated :-)

 This is a presentation style which makes words more readable in some
 conditions, notably on plates displayed on roads (cities are extremely
 rarely written in lowercase, as this is more difficult to read from
 far away when driving).

 Half a century ago, the UK, after extensive empirical testing,
 mandated mixed case for road signs because it is significantly easier
 to read at speed.
 Our cousins across the Atlantic have finally caught on, and
 the U.S. Federal Highway Administration now mandates mixed case for
 place names, while leaving fixed wording in all caps.




Re: German »ß«

2013-02-16 Thread Asmus Freytag

On 2/16/2013 12:06 PM, Philippe Verdy wrote:

2013/2/16 Stephan Stiller stephan.stil...@gmail.com:

Of course in my worldview, all-caps writing is deprecated :-)

This is a presentation style which makes words more readable in some
conditions, notably on plates displayed on roads (cities are extremely
rarely written in lowercase, as this is more difficult to read from
far away when driving). Capitals anyway do not exclude preserving
distinctions (so there's a capital Ess-Tsett which preserves the
distinction with SS, anc accents are still present, even if they are
difficult to distinguish from far away on roads)


This may be a French thing.

A./

For US, see discussion here: 
http://www.studio360.org/2011/jan/21/design-real-world/


For Germany, look at 
http://www.ace-online.de/fileadmin/user_uploads/Der_Club/Presse-Archiv/Bilder/Verkehr/Autobahn/Autobahn_01.jpg

or google Autobahnschilder  for more

PPS: Sweden has quite a bit of UPPERCASE, but seems to use mixed case 
for some purposes (such as legends on warning signs and minor 
destinations on road signs).


Deprecation only concerns long texts, presented in multiline
paragraphs, for which capitals make the text less easy to read.







Re: s-j combination in Unicode?

2013-02-16 Thread Stephan Stiller

from earlier:

Otto Scholz

Oops, sorry. Otto Stolz.

And usually not totally sense-destroying to a human reader with 
context available. But these fallbacks allow clear misspelled words 
to appear, not just miscapitalized ones. That's huge.


I'm all for a capital version of ß and other such letters, but you may 
be talking in extremes too much. As far as real ambiguities are 
introduced, the loss of capitalization on the first letter introduces 
far more, impressionistically speaking, and they might be legally 
subtle; though those very sporadically occurring ones coming from SS 
are more likely to be totally sense-destroying, yes. What I'm also 
saying is that it's a minor issue compared to the destruction of 
readability by usage of all-caps in the first place. I'd rather focus on 
avoiding ambiguity in written language otherwise. Those concerning 
syntactic structure are troublesome, for example.


S




Re: s-j combination in Unicode?

2013-02-16 Thread Asmus Freytag

On 2/16/2013 9:55 PM, Stephan Stiller wrote:

from earlier:

Otto Scholz

Oops, sorry. Otto Stolz.

And usually not totally sense-destroying to a human reader with 
context available. But these fallbacks allow clear misspelled words 
to appear, not just miscapitalized ones. That's huge.


I'm all for a capital version of ß and other such letters, but you may 
be talking in extremes too much.


Never!

;)

Actually, the question that started this particular discussion is most 
likely moot, because the fact that Andries has located at the minimum an 
existing font implementation of capital S+J. That seems to indicate 
that, again at the bare minimum, there are other people who think that 
SJ is not the way to render this.


A./




Re: German »ß«

2013-02-16 Thread Julian Bradfield
On 2013-02-17, Philippe Verdy verd...@wanadoo.fr wrote:
 True lowercase letters are causing problems on road sign indicators on
 roads with high speed : they are hard to read and if the driver has to
 look at them for one more second, he does not look at the road.

AS I SAID, empirical evaluation by those who had good cause to care
about the issue indicates the opposite, that people take longer to
read all caps (as is also the case in normal text).
This evaluation was done specifically for high speed roads. It
included live testing on one motorway.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.