Re: Standaridized variation sequences for the Desert alphabet?

Michael Everson Wed, 29 Mar 2017 06:16:30 -0700

Martin,

It’s as though you’d not participated in this work for many years, really.

> On 29 Mar 2017, at 11:12, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote:
> 
> Hello everybody,
> 
> Let me start with a short summary of where I think we are at, and how we got 
> there.
> 
> - The discussion started out with two letters, with two letter forms each. 
> There is explicit talk of the 40-letter alphabet and glyphs in the Wikipedia 
> page, not of two different letters.

SO WHAT? Alphabets have “letters” in them. “Letters” are not “characters”. In 
Welsh, “ch” and “dd” and “ll” are “letters”. 

> - That suggests that IF this script is in current use,

You don’t even know? You’re kidding, right?

> and the shapes for these diphthongs are interchangeable

It does NOT “suggest” that at all. 

> (for those who use the script day-to-day, not for meta-purposes such as 
> historic and typographic texts), keeping things unified is preferable.

Deseret was a spelling reform replacement alphabet used for a period of time by 
the Mormons in what is now Utah. It is structurally very similar to Pitman’s 
Phonotypic alphabets. Alphabets. There were many revisions of those. Some of 
them used letterforms we have encoded today, for IPA for instance. Some used 
letterforms we’d hardly recognize, and we’d never, ever consider them to be 
glyph variants of the IPA letters. 

> - As far as we have heard (in the course of the discussion, after questioning 
> claims made without such information), it seems that:

Yeah, it doesn’t “seem” anything but a whole lot of special pleading to bolster 
your rigid view that the glyphs in question can be interchangeable because of 
the sounds they may represent. 

>  - There may not be enough information to understand how the creators and 
> early users of the script saw this issue, 

Um, yeah. As if there were for Phoenician, or Luwian hieroglyphs, right?

> on a scale that may range between "everybody knows these are the same, and 
> nobody cares too much who uses which, even if individual people may have 
> their preferences in their handwriting" to something like "these are 
> different choices, and people wouldn't want their texts be changed in any way 
> when published”.

We know what the diphthongs were. We know that the script had a spelling reform 
where some characters were abandoned in favour of other characters. There was 
at least one font wh

And there is lots of handwriting in which people write what they want to write, 
in the non-Latin alphabet they learned. 

As far as your guessing what people had in their minds about what they were 
writing, and as to your speculation about what the very few printers who had 
Deseret type might have done with such manuscripts, well, it is all reine 
Phantasie on your part. 

Oh! Look! There was a spelling reform. I should write “Fantasie”, shouldn’t I? 
Wait! I can have spell-check dictionaries suit my preference! Wow! That’s 
amazing!

>  - Similarly, there seem to be not enough modern practitioners of the script 
> using the ligatures that could shed any light on the question asked in the 
> previous item in a historical context,

Completely irrelevant. Nobody worried about the number of modern users of the 
Insular letters we encoded. Why put such a constraints on users of Deseret? Ꝺꝺ 
Ꝼꝼ Ᵹᵹ Ꝿ Ꞃꞃ Ꞅꞅ Ꞇꞇ. 

> first apparently because there are not that many modern practitioners at all, 
> and second because modern practitioners seem to prefer spelling with 
> individual letters rather than using the ligatures.

This is equally ridiculous. John Jenkins chooses not write the digraphs in the 
works which he transcribed, because that’s what *he* chooses. He doesn’t speak 
for anyone else who may choose to write in Deseret, and your assumption that 
“modern practitioners” do this is groundless. 

It also ignores the fact that the script had a reform and that the value of 
separate encodings for the various characters is of value to those studying the 
provenance and orthographic practices of those who wrote Deseret when it was in 
active use. 

This is exactly the same thing as the medievalist Latin abbreviation and other 
characters we encoded. There is neither sense nor logic nor utility in trying 
to argue for why editors of Deseret documents shouldn’t have the same kinds of 
tools that medievalists have. And as far as medievalist concerns go, many of 
the characters are used by relatively few researchers. Some of the characters 
we encoded are used all over Europe at many times. Some are used only by 
Nordicists, some by Celticists, and some by subsets within the Nordicist and 
Celticist communities. 

> - IF the above is true, then it may be that these ligatures are mostly used 
> for historic purposes only, in which case it wouldn't do any harm to 
> present-day users if they were separated.

Harm? What harm? Recently the UTC looked at a proposal for capital letters for 
ʂ and ʐ. Evidence for their existence was shown. One person on the call to the 
UTC said he didn’t think anyone needed them. Two of us do need them. I needed 
them last weekend and I had to use awkward workarounds. They weren’t accepted. 
There wasn’t any good rationale for the rejection. I mean, the letters exist. 
Case is a normal function of the script. But they weren’t accepted. For the guy 
who didn’t think he needed them, well, so what? If they’re encoded, he doesn’t 
have to use them. 

Harm to present-day users? I agree with you. Any modern-day user creating new 
texts who doesn’t like to use the diphthong letters doesn’t have to use them. 
Any modern-day user trying to represent historic texts accurately, however, 
can’t, because not all the letters are encoded. 

> If the above is roughly correct, then it's important that we reached that 
> conclusion after explicitly considering the potential of a split to create 
> inconvenience and confusion for modern practitioners,

People who use Deseret use it to for historical purposes and for cultural 
reasons. Everybody in Utah reads English in standard Latin orthography. 

> not after just looking at the shapes only, coming up with separate historical 
> derivations for each of them, and deciding to split because history is way 
> more important than modern practice.

I didn’t “come up” with separate historical derivations for the four characters 
in question. It is entirely obvious that LONG AH, SHORT AH, LONG OO, and SHORT 
OO are variously combined with the stroke of SHORT I. 

Entirely obvious. There is no other interpretation. 

> In that light, some more comments lower down.
> 
> On 2017/03/28 22:56, Michael Everson wrote:
>> On 28 Mar 2017, at 11:39, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote:
> 
>> An æ ligature is a ligature of a and of e. It is not some sort of pretzel.
> 
> Yes. But it's important that we know that because we have been faced with 
> many cases where "æ" and "ae" were used interchangeably.

Irrelevant. This is just spelling. It’s no different than colour/color or 
maximize/maximise or aluminium/aluminum. 

> For somebody not knowing the (extended) Latin alphabet and its usages, they 
> might easily see more of a pretzel and less of 'a' and 'e'. I might try some 
> experiments with some of my students (although I'm using "formulæ" in my 
> lecture notes, and so they might already be too familiar with the "æ”).

You have missed the point fabulously. The point was that the æ ligature can be 
easily identified as being made of A and of E. And the four Deseret characters 
can easily be identified as being made of LONG AH, SHORT AH, LONG OO, and SHORT 
OO with the stroke of SHORT I. 

> Also, if it were the case that shapes like "æ" and "œ" were used 
> interchangeably across all uses of the Latin alphabet, I'm quite sure we 
> would encode it with one code point rather than two, even if some researchers 
> might claim that the later was derived from an "o" rather than an "ɑ", or 
> even if we knew it was derived from an "o" (as we know for the ß).

I don’t agree, and there are hundreds of 

>> What Deseret has is this:
>> 
>> 10426 DESERET CAPITAL LETTER LONG OO WITH STROKE
>>      * officially named “ew” in the code chart
>>      * used for ew in earlier texts
>> 10427 DESERET CAPITAL LETTER SHORT AH WITH STROKE
>>      * officially named “oi” in the code chart
>>      * used for oi in earlier texts
>> 1xxxx DESERET CAPITAL LETTER LONG AH WITH STROKE
>>      * used for oi in later texts
>> 1xxxx DESERET CAPITAL LETTER SHORT OO WITH STROKE
>>      * used for ew in later texts
> 
> Currently, it has this:
> 
> 10426 𐐦 DESERET CAPITAL LETTER OI
> 
> 10427 𐐧 DESERET CAPITAL LETTER EW

You are being deliberately obtuse. Note that I stated clearly “officially named 
‘ew/oi’ in the code chart”. 

> My personal opinion is that names are mostly hints, and not too much should 
> be read into them, 

I do not share this opinion.

> but if anything, the names in the current charts would suggest that the 
> encoding is for the 39th/40th letter of the Deseret alphabet, whatever its 
> shape, not for some particular shape.

You make too much of these numbers, but then there are charts of the 38-letter 
alphabet and charts of the 40-letter alphabet, but those numbers have to do 
with the number of English phonemes represented in Phonotypy and in Deseret, 
and with the augmentation of that by the addition of letters which represent 
phonemes. 

> And you know as well as I do that we can't change names. So if we split, we 
> might end up with something like:
> 
> 10426 𐐦 DESERET CAPITAL LETTER OI
> 
> 10427 𐐧 DESERET CAPITAL LETTER EW
> 
> 1xxxx <𐐃𐐆> DESERET CAPITAL LETTER VARIANT OI
> 
> 1xxxx <𐐆𐐋> DESERET CAPITAL LETTER VARIANT EW

I’m pretty sure we will propose the names LONG AH WITh STROKE and SHORT OO WITH 
STROKE. The two un-encoded characters are used for the *diphthongs* oi and ew 
but they are not “variants” of the other letters. 

We do not require matching names here. Compare LATIN LETTER YR and LATIN LETTER 
SMALL CAPITAL R. Compare LATIN CAPITAL LETTER HWAIR and LATIN SMALL LETTER HV. 

>> Don’t go trying to tell me that LONG OO WITH STROKE and SHORT OO WITH STROKE 
>> are glyph variants of the same character.
>> 
>> Don’t go trying to tell me that LONG AH WITH STROKE and SHORT AH WITH STROKE 
>> are glyph variants of the same character.
> 
> We have just established that there are no characters with such names in the 
> standard. It's not the names or the history that I'm arguing.

You’re being obtuse again. Fine. 

Don’t go trying to tell me that EW and SHORT OO WITH STROKE are glyph variants 
of the same character.

Don’t go trying to tell me that LONG AH WITH STROKE and OI are glyph variants 
of the same character.

They’re not. The origin of all those letterforms is obvious, and we do not 
encode sounds, we encode the elements of writing systems. 

>> To do so is to show no understanding of the history of writing systems at 
>> all.
> 
> What I'd agree to is that cases where shapes with different historical 
> origins merge and get treated as one and the same character are quite a lot 
> rarer than cases where they don't merge. 

They didn’t merge in Deseret. They had a reform, removing some characters and 
adding some other characters. 

> But we have seen cases where such a merge happens. ß is one of them.

That’s even arguable because ſʒ only really occurs in the whole-font Fraktur 
style. It’s pretty rare to see it in Antiqua. Of course it must be attested 
there, but it’s by no means common. 

> There are quite a few in Han (not surprising because there are tons of 
> ideographs there to begin with).
> 
> But that experience doesn't mean that we have to rush to a conclusion without 
> examining as much of the evidence as we can get hold of.

I haven’t rushed to a conclusion. I’ve made a thorough analysis. 

>> You’re smarter than that. So are Asmus and Mark and Erkki and any of the 
>> other sceptics who have chimed in here.
> 
> Skepticism is when presented with options without background facts is a 
> virtue in my opinion.

Your argument seemed to be based solely on the use of the letters for the 
sounds, ignoring the historical derivation and the facts of the spelling reform 
in Deseret. 

>> The UTC encodes a great many characters without checking them at all, or 
>> even offering documentation on them to SC2. Don’t think we haven’t observed 
>> this.
> 
> As for BROCCOLI that you mention later and other emoji, first I would like to 
> make clear that I don't use emoji personally nor do I push for their encoding.

I *do* use emoji and I have devised many emoji which are now in current use. I 
do find that the process for adding symbols to the UCS (which is not the same 
thing as giving symbols the emoji property) is not functioning particularly 
well at present. 

> But what's important for the discussion at hand is that when it comes to 
> emoji, the question of whether we should unify or disunify BROCCOLI and 
> CAULIFLOWER (just a hypothetical example) isn't as important.

Eventually we will have CABBAGE, and then some people will need to use ZWJ to 
join CABBAGE and KNIFE so that sauerkraut can be represented, and then other 
people will need to use ZWJ to join CABBAGE and HOT PEPPER for kimchi, and in 
Ireland we’ve got bacon and cabbage of course, and...

Heh. 

> That's because there is no preexisting user community that would be seriously 
> inconvenienced the way it would happen if we suddenly disunified the ſs/ſz 
> ligature, or suddenly unified "æ" and "œ". Emoji are a hopeless hodgepodge, 
> where users click on what they see, and hope that it shows close enough to 
> what they meant at the other end or after a few years.

No one using Deseret will be inconvenienced by adding additional historical 
characters for the already historical script. Anyone using modern Deseret fonts 
*would* be inconvenience by unifying the LONG-AH-WITH-STROKE and 
SHORT-AH-WITH-STROKE characters and the LONG-OO-WITH-STROKE and 
SHORT-OO-WITH-STROKE characters, I think. No current fonts that I know of have 
the 1859 glyphs, apart from private fonts Ken Beesley used for his own work. 

>>> Of course they will easily see different shapes, but what's important isn't 
>>> the shapes, it's what they associate it with. If for them, it's just two 
>>> shapes for one and the same 40th letter of the Deseret alphabet, then that 
>>> is a strong suggestion for not encoding separately, even if the shapes look 
>>> really different.
>> 
>> Martin, there is no answer to this unless you can read the minds of people 
>> who are dead a century or more.
> 
> Thanks for telling us, finally.

What on earth do you mean? I have withheld no secrets. I’ve objected to your 
wilful unification of characters with obviously different origins. 

>>> To use another analogy, many people these days (me included) would have 
>>> difficulties identifying Fraktur letters, in particular if they show up 
>>> just as individual letters.
>> 
>> I do not believe you.
> 
> It's true. When younger, I tried to read some old books written in Fraktur. 
> It was hard work. Most of the lower letters were okay, but the ſ and the f 
> were easy to confuse, and the k is also confusing. A lot of guessing was 
> needed for upper case. I'm quite sure most people these days couldn't easily 
> identify upper case letters in isolation. Of course, context helps a lot.

It’s not the easiest thing but it does not take all that much to accustom 
oneself to it. 

>> If this were true menus in restaurants and public signage on shops wouldn’t 
>> have Fraktur at all. It’s true that sometimes the orthography on such things 
>> is bad, as where they don’t use ligatures correctly or the ſ at all.
> 
> Shops and newspapers (e.g. NYT) and the like rely a lot on a logo effect. And 
> the situation may be slightly different in Germany and in Switzerland.

People can read the menus and the public signage nevertheless. Fraktur is not 
so unbelievably different that it’s entirely opaque. 

>> I’ll stipulate that few Germans can read Sütterlin or similar hands. :-)
> 
> Definitely agreed!

I learned to write Sütterlin. Going back and reading something written takes 
work too… 

> 
> 
>> On 28 Mar 2017, at 11:59, Mark Davis ☕️ <m...@macchiato.com> wrote:
>> 
>>> I agree with Martin.
> 
>>> Simply because someone used a particular shape at some time to mean a 
>>> letter doesn't mean that Unicode should encode a letter for that shape.
>> 
>> Coming to a forum like this out of a concern for the corpus of Deseret 
>> literature is not some sort of attempt to encode things for encoding’s sake.
> 
> And coming to a discussion like this out of a concern for modern 
> practitioners of the script (even if it seems, after a lot of discussion, 
> that there aren't that many of these, and the issue at hand may indeed not 
> concern them that much) is not some sort of attempt to unify things for 
> unification's sake.

I think you made a lot of assumptions about “modern practitioners” which you 
didn’t disclose.

A proposal will be forthcoming. I want to thank several people who have written 
to me privately supporting my position with regard to this topic on this list. 
I can only say that supporting me in public is more useful than supporting me 
in private. 

Michael

Re: Standaridized variation sequences for the Desert alphabet?

Reply via email to