Re: Standaridized variation sequences for the Desert alphabet?

Michael Everson Tue, 28 Mar 2017 07:02:58 -0700

On 28 Mar 2017, at 11:39, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote:

>> And what would the value of this be? Why should I (who have been doing this 
>> for two decades) not be able to use the word “character” when I believe it 
>> correct? Sometimes you people who have been here for a long time behave as 
>> though we had no precedent, as though every time a character were proposed 
>> for encoding it’s as thought nothing had ever been encoded before.
> 
> I didn't say that you have to change words. I just said that I could agree to 
> a slightly differently worded phrase.

An æ ligature is a ligature of a and of e. It is not some sort of pretzel. What 
Deseret has is this:

10426 DESERET CAPITAL LETTER LONG OO WITH STROKE
        * officially named “ew” in the code chart
        * used for ew in earlier texts
10427 DESERET CAPITAL LETTER SHORT AH WITH STROKE
        * officially named “oi” in the code chart
        * used for oi in earlier texts
1xxxx DESERET CAPITAL LETTER LONG AH WITH STROKE
        * used for oi in later texts
1xxxx DESERET CAPITAL LETTER SHORT OO WITH STROKE
        * used for ew in later texts

Don’t go trying to tell me that LONG OO WITH STROKE and SHORT OO WITH STROKE 
are glyph variants of the same character. 

Don’t go trying to tell me that LONG AH WITH STROKE and SHORT AH WITH STROKE 
are glyph variants of the same character. 

To do so is to show no understanding of the history of writing systems at all. 
You’re smarter than that. So are Asmus and Mark and Erkki and any of the other 
sceptics who have chimed in here. 

> And as for precedent, the fact that we have encoded a lot of characters in 
> Unicode doesn't mean that we can encode more characters without checking each 
> and every single case very carefully, as we are doing in this discussion.

The UTC encodes a great many characters without checking them at all, or even 
offering documentation on them to SC2. Don’t think we haven’t observed this. 

>> The sharp s analogy wasn’t useful because whether ſs or ſz users can’t tell 
>> either and don’t care.
> 
> Sorry, but that was exactly the point of this analogy. As to "can't tell", 
> it's easy to ask somebody to look at an actual ß letter and say whether the 
> right part looks more like an s or like a z.

By “can’t tell” I mean “recognize as essentially the same letterform”. The 
streetsigns in some German cities use a very ſʒ if you look at it and know 
anything about typography. Most people probably don’t notice. They see ß and 
that’s precisely because ſs and ſʒ look very much alike. 

> On the other hand, users of Deseret may or may not ignore the difference 
> between the 1855 and 1859 shapes when they read.

The people who wrote the manuscripts are dead. Most readers and writers of 
Deseret today use the shapes that are in their fonts, which are those in the 
Unicode charts, and most texts published today don’t use the EW and OI 
ligatures at all, because that’s John Jenkins’ editorial practice. The need to 
distinguish these letters (which are distinguished because of their history as 
letterforms, not because of the diphthong) is no different from the reason we 
encoded these Ꜩ Ꜫ Ꜭ Ꜯ Ꜳ Ꜵ Ꜷ Ꜹ Ꜻ Ꜽ Ꜿ Ꝃ Ꝁ Ꝅ Ꝇ Ꝉ Ꝋ Ꝍ Ꝏ Ꝑ Ꝓ Ꝕ Ꝗ Ꝙ Ꝛ Ꝝ Ꝟ Ꝡ Ꝣ Ꝥ Ꝧ Ꝩ Ꝫ 
Ꝭ Ꝯ Ꝺ Ꝼ Ᵹ Ꝿ Ꞁ Ꞃ Ꞅ Ꞇ. Scholars required those. Manuscripts may contain them side 
by side. Or their usage may be separated by hundreds of kilometres or hundreds 
of years. There is no difference. There were pages of discussion as to WHY 
scholars needed the medievalist characters. The counter argument was “Why not 
normalize?” We had similar pages of discus!
 sion as to WHY Uralicists needed the great many characters we encoded for 
them. 

Why is it that you people can encode BROCCOLI on the basis of nothing but 
“people might like it” but we cannot use sound existing precedent to encode 
characters which (while similar in use to other characters) are an index of 
orthographic change in a historical script and orthography? There are plenty of 
“glyph variations” in early Deseret texts vis à vis which I’d ignore. 

This isn’t one of them. 

> Of course they will easily see different shapes, but what's important isn't 
> the shapes, it's what they associate it with. If for them, it's just two 
> shapes for one and the same 40th letter of the Deseret alphabet, then that is 
> a strong suggestion for not encoding separately, even if the shapes look 
> really different.

Martin, there is no answer to this unless you can read the minds of people who 
are dead a century or more. Therefore it is not a useful criterion, and the 
other criteria (letter origin, spelling choice) are the indices which must 
guide our understanding. The result of those criteria is that there are four 
characters here, not two. 

> No Fraktur fonts, for instance, offer a shape for U+00DF that looks like an 
> ſs. And what Antiiqua fonts do, well, you get this:
>> 
>> https://en.wikipedia.org/wiki/%C3%9F#/media/File:Sz_modern.svg
> 
> Yes. And we are just starting to collect evidence for Deseret fonts.

Well you aren’t going to get full repertoires from the 19th-century lead type 
because they don’t exist. We have what we have of them, and we have the 
manuscripts. As to modern digital typefaces, there are NONE which support the 
1859 letters. And I’ve seen most of them. 

>> And there’s nothing unrecognizable about the ſɜ (< ſꝫ (= ſz)) ligature there.
> 
> Well, not to somebody used to it. But non-German users quite often use a 
> Greek β where they should use a ß, so it's no surprise people don't 
> distinguish the ſs and ſz derived glyphs.

I’ve received German texts which used Greek β. But that’s not the point. People 
don’t distinguish the ſs and ſʒ glyphs because they look pretty much the same 
AND there’s no reason to distinguish them. A world of difference between that 
and the Deseret LETTERs WITH STROKE.

>> The situation in Deseret is different.
> 
> The graphic difference is definitely bigger,

For pity’s sake, Martin. 𐐉 𐐃 look NOTHING ALIKE. And 𐐅 and 𐐋 look NOTHING 
ALIKE. This isn’t anything like ſs and ſʒ and ſz and ß. 

> so to an outsider, it's definitely quite impossible to identify the pairs of 
> shapes. But that does in no way mean that these have to be seen as different 
> characters (rather than just different glyphs) by insiders (actual users).

They had a script reform and they cut new type. The did this on purpose. Note 
that in their ligatures they shifted from SHORT AH and LONG OO to LONG AH and 
SHORT OO. 

> To use another analogy, many people these days (me included) would have 
> difficulties identifying Fraktur letters, in particular if they show up just 
> as individual letters.

I do not believe you. If this were true menus in restaurants and public signage 
on shops wouldn’t have Fraktur at all. It’s true that sometimes the orthography 
on such things is bad, as where they don’t use ligatures correctly or the ſ at 
all.

I’ll stipulate that few Germans can read Sütterlin or similar hands. :-)

> Similar for many fantasy fonts, and for people not very familiar with the 
> Latin script.

What’s a fantasy font? And what does this have to do with supporting the 
encoding in plain text of historical documents in the Deseret script?

>> The lower two letterforms are in no way “glyph variants” of the upper two 
>> letterforms. Apart from the stroke of the SHORT I 𐐆 they share nothing in 
>> common — because they come from different sources and are therefore 
>> different characters.
> 
> The range of what can be a glyph variant is quite wide across scripts and 
> font styles. Just that the shapes differ widely, or that the origin is 
> different, doesn't make this conclusive.

LONG OO WITH STROKE is not a glyph variant of SHORT OO WITH STROKE. LONG AH 
WITH STROKE is not a glyph variant of SHORT AH WITH STROKE. 

>> I don’t think that ANY user of Deseret is all that “average”. Certainly some 
>> users of Deseret are experts interested in the script origin, dating, 
>> variation, and so on — just as we have medievalists who do the same kind of 
>> work. I’m about to publish a volume full of characters from Latin 
>> Extended-D. My work would have been impossible had we not encoded those 
>> characters.
> 
> No, your work wouldn't be impossible. It might be quite a bit more difficult, 
> but not impossible.

No. Wrong. Wrong, wrong, wrong. No, Martin. We encoded the Latin characters on 
the basis of good arguments. You do NOT get to invalidate that, or to pretend 
that the encoding of those characters was a mistake, or anything like it. Many 
scholars — including myself — use these characters, and that is what the 
Universal Character Set is for. 

Also, apparently, it is for pictures of BROCCOLI. 

> I have written papers about Han ideographs and Japanese text processing where 
> I had to create my own fonts (8-bit, with mostly random assignments of 
> characters because these were one-off jobs), or fake things with inline 
> bitmap images (trying to get information on the final printer resolution and 
> how many black pixels wide a stem or crossbar would have to be to avoid 
> dropouts, and not being very successful).

All of use make use of nonce glyphs for examples. That’s not the same as making 
an edition of a medieval Cornish text, or of a Mormon diary. We do NOT want to 
have to use font trickery 

> I have heard the argument that some character variant is needed because of 
> research, history,... quite a few times. If a character has indeed been 
> historically used in a contrasting way,

Contrast may be geographical or temporal. 

> this is definitely a good argument for encoding. But if a character just 
> looked somewhat different a few (hundreds of) years ago,

Also, LATIN LETTER D WITH STROKE is a different letter from LATIN LETTER T WITH 
STROKE. Why? Because the underlying letters are different. And it’s no 
different for Deseret. 

Your suggestion that LONG AH WITH STROKE and SHORT AH WITH STROKE are the same 
character is unsupportable. 

> that doesn't make such a good argument. Otherwise, somebody may want to 
> propose new codepoints for Bodoni and Helvetica,…

This suggestion is nonsense. 

On 28 Mar 2017, at 11:59, Mark Davis ☕️ <m...@macchiato.com> wrote:

> I agree with Martin.
> 
> Moreover, his last paragraphs are getting at the crux of the matter. Unicode 
> is not a registry of glyphs for letters, nor should try to be. 

DESERET LETTER LONG AH WITH STROKE is not a glyph variant of DESERET LETTER 
SHORT AH WITH STROKE. 

> Simply because someone used a particular shape at some time to mean a letter 
> doesn't mean that Unicode should encode a letter for that shape.

Coming to a forum like this out of a concern for the corpus of Deseret 
literature is not some sort of attempt to encode things for encoding’s sake. 

> We do not need to capture all of the shapes in 
> https://upload.wikimedia.org/wikipedia/commons/f/fc/Gebrochene_Schriften.png 
> simply because somebody is going to "publish a volume full of" those shapes.

That analogy has nothing to do with the discussion about the Deseret letters. 

On 28 Mar 2017, at 12:33, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote:

> Do you think that the 1855/1859 distinction is needed in file names? In text 
> messages? It may help in some kinds of databases, but it may also be possible 
> to just tag each piece of text in the database with "1855" or "1859" if that 
> distinction is important (e.g. for historical documents). As far as I 
> understand, we are still looking for actual texts that use both shapes of the 
> same ligature concurrently.

I think that this is the sort of distinction that should be made in plain text, 
yes. The 1859 letters are not "glyph variants” of the 1855 letters by any 
criterion in the history of writing systems that I recognize. 

On 2017/03/28 01:20, Michael Everson wrote:

>> Ken transcribes into modern type a letter by Shelton dated 1859, in which 
>> “boy” is written 𐐒<𐐃𐐆>, “few” as 𐐙<𐐆𐐋>, “truefully” [sic] as 𐐓𐐡<𐐆𐐋>𐐙𐐋𐐢𐐆, and 
>> “you” as 𐐏<𐐆𐐋>.
> 
> These are all 1859 variants, yes?

Yes, it was one letter written by one person at one sitting and he used one 
orthography and he didn’t mix it with the other orthography. 

> That would just show that these variants existed (which I think nobody in 
> this discussion has doubted), but not that there was contrasting use. And is 
> that letter hand-written or printed?

They had a script reform. At first Mormons used the letter SHORT AH WITH STROKE 
[ɒɪ] for /ɔɪ/ and then later they used LONG AH WITH STROKE [ɔːɪ] for /ɔɪ/. And 
at first Mormons used the letter LONG OO WITH STROKE [ɪuː] for /juː/ and then 
later they used SHORT OO WITH STROKE [ɪʊ] for /juː/. And some Mormons didn’t 
use either, they just wrote the diphthongs with digraphs of other letters. 

On 28 Mar 2017, at 13:10, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote:

>> And the same goes for the /juː/ ligatures. The word tube /tjuːb/ can be 
>> written TYŪB 𐐓𐐏𐐅𐐒 or 𐐓𐐧𐐒 or 𐐓<𐐆𐐋>𐐒. But the unligated the sequences would be 
>> pronounced differently: 𐐓𐐏𐐅𐐒 /tjuːb/ and 𐐓𐐆𐐅𐐒 /tɪuːb/ and 𐐓𐐆𐐋𐐒 /tɪʊb/.
> 
> Ah, I see. So we seem to have five different ways (counting the two ligature 
> variants) of writing the same word,

That’s called spelling.

> with three different pronunciations.

No, that’s wrong. I give those transcriptions to show the usual meanings of the 
Deseret letters. So if you were going to write “tube” /tjuːb/ you would write 
𐐓𐐏𐐅𐐒 or 𐐓𐐧𐐒 or 𐐓<𐐆𐐋>𐐒. In the second sentence I show that while the ligated 
letters 𐐧 and <𐐆𐐋> can be used for /juː/ the unligated sequences 𐐆𐐅 and 𐐆𐐋 
would in principle be pronounced /ɪuː/ and /ɪʊ/ respectively.

Obviously the pronunciation of the word “tube” would not have changed for 
speakers of English in Mormon territories in the middle of the 19th century. 
(Of course many dialects of English in North America now have /tuːb/ rather 
than /tjuːb/ but that is not relevant here. 

> The important question is whether the two ligatures do imply any difference 
> in pronunciation (as opposed to time of writing or author/printer 
> preference), i.e. whether the ligated sequences 𐐓𐐧𐐒 or 𐐓<𐐆𐐋>𐐒 are pronounced 
> differently (not by a phonologist but by an average user).

No, it’s spelling.

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

Reply via email to