Re: Major Defect in Combining Classes of Tibetan Vowels
At 12:15 -0700 2003-06-25, John Hudson wrote: In this case, any existing normalisation for Hebrew is already broken -- in the sense of destroying Biblical Hebrew text -- but still the argument from the UTC seems to be that even broken implementations -- broken because the standard is broken -- must not be broken. That seems very short-sighted indeed. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Major Defect in Combining Classes of Tibetan Vowels
At 14:20 -0700 2003-06-25, John Hudson wrote: John, Write it up with glyphs and minimal pairs and people will see the problem, if any. Or propose some solution. (That isn't add duplicate characters.) In Biblical Hebrew, it is possible for more than one vowel to be attached to a single consonant. This means that is it very important to maintain the ordering of vowels applied to a single consonant. The Unicode Standard assigns an individual combining class to every vowel, meaning that NFC normalisation may re-order vowels on a consonant. This is not simply 'non-traditional' but results in incorrect rendering and a different vocalisation of the text. The point is that hiriq before patah is *not* canonically equivalent to patah before hiriq, except in the erroneous assumption of the Unicode Standard: the order of vowels makes words sound different and mean different things. In order to correctly encode and render the Biblical Hebrew text, it is necessary to either a) never use normalisation routines that re-order marks (which is beyond the control of document authors), or b) re-classify the existing Hebrew marks so that all vowels are in a single class and will not be re-ordered during normalisation, or c) encode new marks for Biblical Hebrew with all vowels in a single class. There are a few other desirable changes to the combining class assignments for some Hebrew accents, which make rendering easier and are more linguistically logical, but the vowels are the most problematic. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] If you browse in the shelves that, in American bookstores, are labeled New Age, you can find there even Saint Augustine, who, as far as I know, was not a fascist. But combining Saint Augustine and Stonehenge -- that is a symptom of Ur-Fascism. - Umberto Eco -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Revised N2586R
At 13:03 +0100 2003-06-26, William Overington wrote: Well, certainly authority would be needed, yet I am suggesting that where a few characters added into an established block are accepted, which is what is claimed for these characters, there should be a faster route than having to wait for bulk release in Unicode 4.1. No, there shouldn't. The process will not be changed. Unicode and ISO/IEC 10646 are synchronized, and JTC1 ballotting processes are what they are. No further discussion is necessary, as it is pointless. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Revised N2586R
At 12:09 -0500 2003-06-26, [EMAIL PROTECTED] wrote: The only meaning that the Standard implies is that the character encoded at codepoint x represents they symbol of a wheelchair. It does not imply *anything* about how its usage in juxtaposition with the name of a person should be interpreted. Indeed William's argument that HANDICAPPED is somehow inappropriate just doesn't wash. In Europe at least, many handicapped people consider it far more polite to be called handicapped or behindert or what have you than to be subject to such politically correct monstrosities as differently abled. Which is not to say that the Name Police won't prefer WHEELCHAIR SYMBOL. Time will tell. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Nightmares
At 14:32 -0400 2003-06-26, John Cowan wrote: If you are going to discriminate (invidiously) using a computerized database, using H for Handicapped (or G for Gimp) will do just as well. Are you going to complain about the various symbols of religion already encoded on the same grounds? I am preparing additional religious symbols to help fill the gaps. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)
At 15:36 -0700 2003-06-26, Kenneth Whistler wrote: I now like better the suggestions of RLM or WJ for this. ZZZT. Thank you for playing. RLM is for forcing the right behaviour for stops and parentheses and question marks and so on. Introducing it between two combining characters in Hebrew text would break all kinds of things, and would be horrible, horrible, horrible. Invent a new control character for this weird property-killer, if you must, but don't use an ordering mark for it. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Biblical Hebrew
At 23:59 -0700 2003-06-26, John Hudson wrote: I think there is a reasonable case to be made for treating modern Hebrew and Biblical Hebrew as separate languages for pretty much all purposes. The existing codepoints with the fixed position combining classes work fine for Modern Hebrew, and there's no reason that they should not continue to be used for that language. I would seriously entertain the idea of re-encoding *all* the Hebrew marks, along with non-Tiberian vocalisation marks and anything else specifically needed for Biblical Hebrew, in a separate block, and deprecate the cantillation marks in the Hebrew block. Speaking as a member of WG2, I do not think that we should encode such duplicate characters. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Yerushala(y)im - Biblical Hebrew
At 10:09 +0200 2003-06-27, Jony Rosenne wrote: Whatever you do, any new characters designed for solving these problems should not be in the Hebrew block. Add a new Biblical Hebrew block, clearly labeled as not intended for regular Hebrew use. And I suggest that whenever a proposal comes up to the UTC, it would be advantageous to involve Israeli Biblical scientists in the review. We've wanted *that* for a long time. Indeed it is a long-standing request that Israeli experts help to map the TC46 8-bit standard with cantillation marks to Unicode. Can you help facilitate this? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Biblical Hebrew
At 04:22 -0500 2003-06-27, [EMAIL PROTECTED] wrote: In discussing these issues among Biblical Hebrew implementers, content providers and users, I have had to explain repeatedly why UTC doesn't want to consider this. It is completely obvious to them that this is the right solution. Even on explaining the impact on normalization, the response is that there is no impact since implementations and content using Unicode do not yet exist. Indeed, but the UTC doesn't want to change the normalization stuff even where there are obvious errors, for philosophic reasons, I suppose. I mean who are all the implementors who depend on these tables? Often Unicoders have claimed existing implementations even where none can be shown to exist. Now Ken tempts us with: This is just one more in the accumulating pile of little problems in the decompositions locked down by normalization that will eventually result in the committee going Spaaannggg! and agreeing to publish and maintain a separate, corrected list of equivalences As She Oughta Been which are not constrained by the formal stability guarantees of UAX #15 normalization forms. I'd like to understand how deprecating a character and adding a duplicate one with the right properties differs from deprecating a version of UAX #15 in favour of an Oughta-Been table. :-) I think it would be better to create a new character for this purpose than to use ZWJ in yet another way. I suppose CGJ is tempting. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)
At 04:22 -0500 2003-06-27, [EMAIL PROTECTED] wrote: I just have a hard time believing that 50 years from now our grandchildren won't look back, What were they thinking? So it took them a couple of years to figure out canonical ordering and normalization; why on earth didn't they work that out first before setting things in stone, rather than saddling us with this hodgepodge of ad hoc workarounds? How short sighted. As Rick said, I know this will get shot down; don't bother telling me so. I agree with you, Peter. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)
At 04:22 -0500 2003-06-27, [EMAIL PROTECTED] wrote: Are we saying that ISO doesn't give a rip for implementation issues? Duplication of characters is not the way to fix (forgive me, UTC) *Unicode's* error in combining characters. Or that their notion of ordering distinctions is different from Unicode's such that *any* differently ordering permutation of some given set of characters is considered a distinct representation? Are we saying that the voting members of WG2 are not already aware of the issue that has been discussed and incapable of understanding an explanation of these issues addressed to them? You might submit your paper to WG2. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)
At 04:53 -0500 2003-06-27, [EMAIL PROTECTED] wrote: If they're so unaware of combining classes, might it not seem reasonable to think the the dialog might continue as follows? - [gives explanation of combining classes and the related problem for Hebrew] ISO: So, you're saying you're coming to us asking for duplicates of existing characters because of an error the Unicode Consortium made with some of those character properties they define? - Well, yes, that's basically it. ISO: Then, obviously they need to correct their errors. I mean, it's not like the wrong characters got encoded or something. Tell them to just fix the errors; that can't be difficult to do, and is obviously the right thing to do. This is exactly my view. Who is it who will kill the Unicode Consortium if UAX #15 were to be revised? Did it occur to anyone to *ask* about the possible revision of classes for the dozen or so instances that would be affected? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [cowan: Re: Biblical Hebrew (Was: Major Defect in CombiningClasses of Tibetan Vowels)]
At 14:34 +0200 2003-06-27, Philippe Verdy wrote: On Friday, June 27, 2003 1:29 PM, John Cowan [EMAIL PROTECTED] wrote: Michael Everson scripsit: Change the character classes in Unicode 4.1, and they *might* decide to freeze support at, say, Unicode 3.0. Or they may simply opt to define their *OWN* normalization standard, distinct from Unicode NF* form, and designated in a separate reference document, removing *all* references to UAX#15 from XML and IDNA references, only to guarantee this stability that Unicode would be unable to offer. Let's not this happen! Oh, come on. Let's not put words in people's mouths. Ifs and mights are not facts. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Biblical Hebrew (Was: Major Defect in Combining Classes ofTibetan Vowels)
At 07:28 -0400 2003-06-27, John Cowan wrote: Michael Everson scripsit: Who is it who will kill the Unicode Consortium if UAX #15 were to be revised? Did it occur to anyone to *ask* about the possible revision of classes for the dozen or so instances that would be affected? The IETF, for one. IETF is already very wary of Unicode, even though they recognize the practical necessity of using it, but with the existing stability guarantees about normalization, they have managed to swallow it. Stability *even if wrong* is really, really important to protocol people -- just think of all the nonfunctional stubs in the world of *diplomatic* protocol, maintained in the name of not changing anything. So, you're saying, no one has asked IETF whether or not they would be able to countenance a dozen or so changes for unimplemented things like biblical accents. The W3C would also hit the roof if Unicode normalization changed radically. I don't think anyone is proposing a *radical* change. Neither party is at all happy with even the four (I think) characters that have already changed, and are already beginning to turn into optimistic pessimists (people who smile brightly, nod their heads, and say happily, See, things are every bit as bad as I predicted!). Well, y'all are gonna have to do something, and adding duplicate characters to ISO/IEC 10646 is not going to be well-received, because there isn't anything broken in ISO/IEC 10646. Since the use of non-ASCII characters in things like XML and the DNS depends on the good will of these folks, it is very very dangerous to alienate them, and *they do not care* whether the case is a corner case or not -- _stare decisis_ is everything to them, the actual details little or nothing. You could explain the problem with these Hebrew accents, and ask them to help by accepting a change. Shivering in a cave for fear of the monsters outside isn't going to get anyone anywhere. People of good will can often come to enlightened consensus. Change the character classes in Unicode 4.1, and they *might* decide to freeze support at, say, Unicode 3.0. Or they might understand the problem. People aren't all *that* stupid, methinks. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [cowan: Re: Biblical Hebrew (Was: Major Defect in CombiningClasses of Tibetan Vowels)]
At 09:16 -0400 2003-06-27, John Cowan wrote: Michael Everson scripsit: Oh, come on. Let's not put words in people's mouths. Ifs and mights are not facts. Expressed attitudes are facts, and it's reasonable to extrapolate people's future behaviors, at least the general trend thereof, from their expressed attitudes. When someone draws a line in the sand, it's not unreasonable to expect that crossing it will be taken as a declaration of war. But you might trot on over with a white flag to parley about a problem. They're only human beings over there, just as we are over here. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)
At 10:40 -0400 2003-06-27, John Cowan wrote: Karljürgen Feuerherm scripsit: 1. Everyone is more or less agreed that the present combining class rules as they apply to BH contain mistakes. The clearly preferential way to deal with mistakes in any technological/computing software environment is to FIX them. Not so. Sometimes stability is more important than correctness. And sometimes not, then. What four characters have been corrected so far? Were they important characters to some company? Are there no Christians or Jews in the IETF who might care about a problem like this, where a simple solution might be effected? Particularly if it involves only a handful of characters, and the precedent for making such corrections has been set? Or is our standard, which as I have said many times, will be used for CENTURIES, going to be hobbled by silliness like this forever? Hm? The use of the backslash character in DOS/Windows systems as a path separator is arguably a mistake (paths were borrowed from Unix into DOS 2.0, but the slash was already in use for command-line options, something inherited from CP/M and the ancestral CLI running back through DEC operating systems), but fixing it is out of the question. This is not analogous to the present situation, it seems to me. In the first place, what else is the \ for? :-) No one who wants to use the \ is prevented from doing so except maybe in filenames, in systems which don't allow it. (The colon is disallowed in Apple filenames.) All concerns involving human beings -- ho bios politikos -- are political in some sense. And some have more sense than others, it seems. (Sorry, couldn't resist.) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Accented ij ligatures (was: Unicode Public Review Issuesupdate)
I think the answer is, regarding the soft dot property, please leave the ij ligature alone. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: French group separators
At 13:29 -0400 2003-07-07, Frank da Cruz wrote: Nobody is springing to the defense of this so I'll only say that it's a time-honored practice and we shouldn't be so quick to disparage it, lest we be disparaged several years hence for the things we do :-) It's rotten, and when I typeset books (http://www.evertype.com/books.html) I always have to clean up the text which is invariably littered with these artifacts of old technology. In the world of plain text, two spaces after a sentence-ending period, exclamation mark, question mark, or other mark is actually rather handy to distinguish sentence enders from the same marks used in other ways, esp. periods in abbreviations. Fie! Fie! Unclean! Unclean! -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: French group separators
At 14:27 -0400 2003-07-07, Frank da Cruz wrote: EMACS aside, it's still an interesting question why -- in English at least -- it was customary thoughout the 20th century to put two spaces after a period when typing. I expect it must have been an aesthetic decision. What else could it have been? The typing habit was designed to assist typesetters in reading the manuscript as they were setting type. Traditionally, the typesetters never set the extra space. Sigh. This discussion reminds me of way back to 1984 or 1985, when The Mac is not a typewriter was published. Same story. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: When is a character a currency sign?
At 15:03 -0400 2003-07-07, Tex Texin wrote: When is a character properly called a currency sign? Hunh? When you use it to represent currency. DM was two characters used as a character sign in Germany. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: French group separators
At 15:12 -0400 2003-07-07, John Cowan wrote: Michael Everson scripsit: The typing habit was designed to assist typesetters in reading the manuscript as they were setting type. Either this says that double-spacing after a sentence improves the readability of monospaced documents, or I misunderstand you entirely. It assists the printer. In such a context it has a specific utility. After all, typists are (or were) taught to do so in all sorts of documents, including those like business letters that were not to be typeset. Typists were taught to do it generally, but the origin of the practice is to assist the typesetters. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: French group separators
From Robert Bringhurst's Elements of Typographic Style, pp. 28-20: Use a single word space between sentences. In the nineteenth century, which was a dark and inflationary age in typography and type design, many compositors were encouraged to stuff extra space between sentences. Generations of twentieth-century typists were then taught to do the same, by hitting the spacebar twice after every period. Your typing as well as your typesetting will benefit from unlearning this quaint Victorian habit. As a general rule, no more than a single space is required after a period, or any other mark of punctuation. Larger spaces (e.g., en spaces) are *themselves* punctuation. The rule is usually altered, however, when setting classical Latin and Greek, romanized Sanskrit, phonetics, or other kinds of texts in which sentences begin with lowercase letters. In the absence of a capital, a full *en space* (M/2) between sentences will generally be welcome. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: French group separators
At 18:08 -0400 2003-07-07, Frank da Cruz wrote: It is worth noting that what is described here is the default running mode of Emacs for the English locale. There are a lot more modes on Emacs to handle various languages (including programming languages). Of course. But without two spaces you have greater ambiguity, at least in English: In Mr. Roberts, what is the function of the period? Don't call me Mr. Roberts is my name. Don't call me Mr. Roberts is my name. In European English Mr is generally not followed by a full stop, because the abbreviation contains the first and last letter of the word. (In Finland that would be M:r.) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: French group separators
At 16:22 -0600 2003-07-07, John H. Jenkins wrote: IIRC the English prefer to say Mr Roberts. The, ahem, Irish too. ;-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: French group separators
At 01:10 +0200 2003-07-08, Philippe Verdy wrote: I forgot to ask something: is there a Unicode codepoint assigned to the abbreviation dot (a narrower dot with less margins on left and right than the standard dot), as it seems to be used in some typesetted texts to differentiate it from the punctuation mark for end of sentence ? I am sure there is not. Sometimes a dot is just a dot. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: French group separators
At 17:00 -0600 2003-07-07, John H. Jenkins wrote: IIRC the English prefer to say Mr Roberts. The, ahem, Irish too. ;-) Well, to be frank, I'm sure that the Welsh, Scots, and Manx probably do, too. (Did I leave anybody out *this* time?) The Cornish, of course. :-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures
At 03:25 -0700 2003-07-12, Peter Kirk wrote: Does anyone know of a good resource on the web, or elsewhere, listing the alphabets used for different languages around the world? I know a project was attempted a few years ago at least for Europe. It would be useful to have this kind of data available somewhere even with no official status. http://www.evertype.com/alphabets -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish andAzeri, was: Accented ij ligatures)
At 08:11 -0400 2003-07-12, Patrick Andries wrote: Just out of curiosity, why was « iw » deprecated ? Seems perfectly fine to me. And why was « he » chosen (Herero, Hemba, Hellenic Greek) ? Iwrit (iw), being a German transliteration of the name of the Hebrew language, and Jiddisch (ji) were both thought (by someone) to be less suitable than the English-based he and yi which replaced them. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)
At 01:21 -0400 2003-07-13, John Cowan wrote: I hand-write by making a tall lower-case epsilon glyph and then drawing a solidus over it. I just use the TIRONIAN SIGN ET. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)
At 14:09 -0400 2003-07-13, John Cowan wrote: Michael Everson scripsit: I hand-write by making a tall lower-case epsilon glyph and then drawing a solidus over it. I just use the TIRONIAN SIGN ET. A good choice if you don't slash your DIGIT SEVENs and can make your DIGIT ONEs sufficiently distinct. Eh? I *do* slash my DIGITs SEVEN and I use a single vertical stroke from my DIGITs ONE. The TIRONIAN SIGN ET as used in Ireland has no horizontal stroke. -- Michael Everson * * Everson Typography * * http://www.evertype.com
No UTF-8 in Eudora
Dear all, Apparently, if you are a Eudora user and would to encourage Qualcomm to add proper UTF-8 support to Eudora, you can a request for this option to be included in a future version of Eudora to http://www.eudora.com/developers/feedback/ -- as Eudora 6 is in beta now, perhaps this is a good time to make your opinions known. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)
At 16:21 -0400 2003-07-13, John Cowan wrote: I should have said do slash your DIGIT SEVENs. So the glyph in the Unicode 3.0 book is not typical of Irish practice? It seems to have a horizontal stroke all right. It is utterly typical of Irish practice. I meant that it doesn't have an additional horizontal stroke as a slashed 7 does. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [Private Use Area] Audio Description, Subtitle, Signing
At 10:34 -0700 2003-07-14, Peter Kirk wrote: On 14/07/2003 09:04, Doug Ewell wrote: * Michael Everson's and Roozbeh Pournader's provisional PUA assignments for ARABIC PASHTO ZWARAKAY and AFGHANI SIGN, two legitimate characters that cannot be represented in Unicode by any other means. Why not, may I ask, as a newcomer to this list? Is there some technical reason, or a political one? What do you mean? The ZWARAKAY is a new combining mark; the AFGHANI SIGN is a unique currency symbol. Neither is yet encoded. In the report, Computer Locale Requirements for Afghanistan, it is recommended to use a PUA character until such time as the encoding process has run its course. I would not recommend using COMBINING MACRON for the ZWARAKAY, and I don't know what could be recommended for the AFGHANI SIGN that is already encoded, apart from writing out the word. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic, Samaritan, Phoenician
At 22:16 -0400 2003-07-14, John Cowan wrote: Latn has more letters than Latg does, because it's had to add more; I have made thorns and eths in Latg. ;-) Latg is older than the current use of Latn, though not than Latn's ancestor. You're wrong. Latg is older than Latc (Carolingian) but it is not a separate script. Some Latg characters are hard to identify if all you know is Latn. But we don't encode them separately. Thorn and Wynn and Gha and Ou and Ezh and lots of other Latin letters are hard to identify if all you know is Latn. I think your use of Latn/Latg here isn't convincing. And the Samaritan Pentateuch is often printed in the Samaritan script. A font difference would handle that. Nh. I'd like someone whose native script is Hebrew to comment on mutual intelligibility, which was the main criterion for separating Glagolitic from Cyrillic. I don't think it was. Glagolitic and Cyrillic are obviously two different scripts. My native script isn't Hebrew but I am certain that no one who was could easily read a newspaper article written in Phoenician or Samaritan letters. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Aramaic, Samaritan, Phoenician
At 07:02 -0400 2003-07-15, David J. Perry wrote: What is Latg vs Latn? Latg is the Gaelic variant of the Latin script; Latf is the Fraktur variant of the Latin script; Latn is the generic Roman default. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic, Samaritan, Phoenician
At 08:42 -0400 2003-07-15, Karljürgen Feuerherm wrote: Michael Everson said: My native script isn't Hebrew but I am certain that no one who was could easily read a newspaper article written in Phoenician or Samaritan letters. Surely that is not an argument for encoding a separate script, is it? It is sometimes. :-) Most German people I know can't read the German cursive script used say 50 years ago. But the characters clearly correspond to the Latin characters in use today. The handwriting is difficult to read. One would think that in German schools it would be at least introduced so children would know about it. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic, Samaritan, Phoenician
At 09:22 -0400 2003-07-15, John Cowan wrote: Michael Everson scripsit: Latg is older than the current use of Latn, though not than Latn's ancestor. You're wrong. Latg is older than Latc (Carolingian) but it is not a separate script. VVELLIFYOVCOVNTANCIENTROMANSTYLEASORDINARYLATINSCRIPTTHENYES. I do. C'mon, John, look at Trajan's Column. Yes, it's legible and the wax tablet texts are not, but they are contemporaneous and they are certainly the same script. If I don't know Gha, and I see it, I know I don't recognize it: it's a novel letter. (And I may even think it says OI.) (Michael weeps.) If I see a Gaelic-style G and fail to recognize it *as* a G, that's quite different. Normally one recognizes it in context. I fail to see your point, however. And the Samaritan Pentateuch is often printed in the Samaritan script. A font difference would handle that. Nh. Even now that German uses Antiqua almost exclusively, you might find a Lutherbibel printed recently in Fraktur. Even so, I don't think there's an advantage to unifying it with Hebrew; it is very different. See http://www.orindalodge.org/fonts/kadosh_samaritan_manual_1_10.pdf I don't think it was. Glagolitic and Cyrillic are obviously two different scripts. From UTR #3: # In the encoding, Glagolitic is treated as a separate script from # Cyrillic, principally because the letter shapes are in most cases # totally unrelated, with differences not at all arising from mere # font style. That's a draft by Rick McGowan. It indicates that they are obviously different scripts ;-) Anyway, look at Samaritan Yod and compare it with Hebrew Yod. Not mere font style. And from p. 171 (section 7.3) of TUS 3.0: # The Unicode standard regards Glagolitic as a *separate* script from # Cyrillic, not as a font change from Cyrillic. This position is taken # primarily because Glagolitic appears unrecognizably different from # Cyrillic, and secondarily because Glagolitic has not grown to match # the expansion of Cyrillic. A good update of Rick's original text. What is this thread for? We're going to encode Phoenician. It is the forerunner of Greek and Etruscan. Hebrew went its separate way. The fact that there is a one-to-one correspondence isn't important. We have that for Coptic and Greek too and we are disunifying them. I'm pretty sure we're going to encode Samaritan too -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic, Samaritan, Phoenician
At 12:05 -0400 2003-07-15, John Cowan wrote: Michael Everson scripsit: We disunify Glagolitic, and rightly so too. But that does not mean that there are not intermediate cases that ought to be unified, and without definite criteria, it's hard to know what to do. Just grok them? :-) Nope, won't work. Works for me. When we get to encoding Samaritan, I guess the proposal will stand by itself or not. Not if there are no criteria to judge it on that are better than See, it's obvious! Well, you are going to have to wait. I do not have time to write a proposal on Samaritan right now. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic, Samaritan, Phoenician
At 07:53 -0700 2003-07-15, Peter Kirk wrote: VVELLIHOPEVVEVVILL... ahem... Well, I hope we will count ancient Roman as Latin script rather than add to Unicode yet another new script which is almost identical to an existing one. But then it would make more sense than proposals to add new scripts or partial scripts for biblical Hebrew and for Aramaic, for at least ancient Roman inscriptions can be distinguished from nearly all modern texts by being in a different language. Nope. The Aramaic ranged far beyond the middle east and itself -- not Hebrew -- was the forerunner of Syriac, Manichaean, Sogdian, Mandaean, Parthian, Avestan, Pahlavi, and other scripts. But the existing Hebrew characters in Unicode are already in use for biblical Hebrew texts, as well as for what are probably the majority of surviving examples of ancient Aramaic which is not Syriac - the Aramaic portions of the Hebrew Bible, and presumably also the Aramaic parts of the Talmud and other ancient Jewish writings. Aramaic is not only attested in Biblical texts. From Daniels Bright: Aramaic was the lingua franca of Southwest Asia from early in the first millennium BCE until the Arab Conquest in the mid seventh century CE. Otherwise we end up with a new script for a few ancient inscriptions which are only slightly different in glyph shapes and repertoire and in language from an extensive corpus in an existing Unicode block. We need to do further research on the subject, but it seems to me that Late Aramaic is still a candidate for encoding. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic, Samaritan, Phoenician
At 09:39 -0700 2003-07-15, Peter Kirk wrote: But then J was originally a glyph variant of I, and only quite recently in English have they been fully distinguished as letters. It's not all that recent, and it wasn't English that made the innovation. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic, Samaritan, Phoenician
At 20:17 +0100 2003-07-15, Thomas M. Widmann wrote: John Cowan [EMAIL PROTECTED] writes: I'd like someone whose native script is Hebrew to comment on mutual intelligibility, which was the main criterion for separating Glagolitic from Cyrillic. But if that criterion is applied, surely Georgian Xucuri/Khutsuri should be separated from Georgian Mxedruli/Mkhedruli: Although there roughly is a one-to-one correspondence between the two, and although both are generally applied to the same language (though normally to different stages of it), they definitely are not mutually intelligible (and in fact knowledge of Xucuri seems to be quite low in Georgia). The UTC has agreed that we should do this. After 8 years or so of my whining ;-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic, Samaritan, Phoenician
At 11:14 -0700 2003-07-15, Kenneth Whistler wrote: The main reason for separately encoding Coptic, rather than maintaining what we now recognize to be a mistaken unification with the Greek script, is that it is less useful to people who want to represent Coptic texts to have it be encoded as a variant of Greek than it is to have it be encoded as a distinct script. Particularly as they regularly write text in both Coptic and Greek and this distinction is better expressed in plain text than in the font. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic, Samaritan, Phoenician
At 17:34 -0400 2003-07-15, Patrick Andries wrote: Sütterling ? Sütterlin. Sütterling is the name of a panda in the Berlin zoo. ( Ludwig Sütterlin, 1865-1917) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic, Samaritan, Phoenician
At 21:09 +0100 2003-07-15, Anto'nio Martins-Tuva'lkin wrote: On 2003.07.15, 12:16, Michael Everson [EMAIL PROTECTED] wrote: Latg is the Gaelic variant of the Latin script; Also known as _erse_, I was told. That's incorrect. Erse is a Scots form of the word Irish. It's sometimes (but not politely today) applied to the language; the variant of the Latin script is usually called Gaelic script. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [Private Use Area] Audio Description, Subtitle, Signing
William. If CENELEC wishes to standardize a set of icons, they will do so. If they have a need to interchange data using those icons, they will (if they are wise) come to us an ask to encode them. If they want to use the Private Use Area before they do that, they will. Please don't tell us all about it over and over again, as you have done. If you want to talk to CENELEC, do so. Please stop trying to peddle your PUA schemes for CENELEC to us. I maintain the ConScript Unicode Registry, which contains PUA assignments. I do not promulgate those on this list. (Apart from that fun testing of the Phaistos implementation some time ago.) Roozbeh and I assigned two unencoded characters for Afghanistan to the PUA, and we encourage implementors to use them until such time as the characters are encoded. We do not spend oceans of digital ink evangelizing our brilliant schemes to the Unicode list. It is essentially a matter for end users of the system, just as the two Private Use Area characters being suggested in another thread of this forum in relation to Afghanistan are a matter for end users of the Unicode Standard and does not affect the content of the Unicode Standard itself. Then go talk about it with the users of the system. Code points for the symbols are needed now or in the near future. Are they? By whom? And if they need to use the PUA, they can do so. It's Private. It remains to be seen what will be decided as the built-in font for the European Union implementation of the DVB-MHP specification. It might be the minimum font of the DVB-MHP specification or it might be more comprehensive. For example, should Greek characters be included? Should weather symbols be included? These and many other issues remain to be decided. The minimum font for any specification for Europe should be the MES-2. If you are talking to these people, tell them. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [Private Use Area] Audio Description, Subtitle, Signing
At 17:01 +0100 2003-07-17, William Overington wrote: Michael Everson raises some interesting points. William. If CENELEC wishes to standardize a set of icons, they will do so. If they have a need to interchange data using those icons, they will (if they are wise) come to us an ask to encode them. If they want to use the Private Use Area before they do that, they will. Perhaps I may explain the situation? No, thank you. If CENELEC wants to propose characters to the Unicode Standard, they can contact us. I'd be interested in helping, if they had a good case. But I'm not looking for extra work right now. Now, I have never heard of the MES-2 whatever that is. However, I do not have deep knowledge of the various standards which exist. Could you possibly say some more about MES-2 please. A.4.2 282 MES-2 282 MES-2 is specified by the following ranges of code positions as indicated for each row. Rows Positions (cells) 00 20-7E A0-FF 01 00-7F 8F 92 B7 DE-EF FA-FF 02 18-1B 1E-1F 59 7C 92 BB-BD C6-C7 C9 D8-DD EE 03 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D7 DA-E1 04 00-5F 90-C4 C7-C8 CB-CC D0-EB EE-F5 F8-F9 1E 02-03 0A-0B 1E-1F 40-41 56-57 60-61 6A-6B 80-85 9B F2-F3 1F 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D 80-B4 B6-C4 C6-D3 D6-DB DD-EF F2-F4 F6-FE 20 13-15 17-1E 20-22 26 30 32-33 39-3A 3C 3E 44 4A 7F 82 A3-A4 A7 AC AF 21 05 16 22 26 5B-5E 90-95 A8 22 00 02-03 06 08-09 0F 11-12 19-1A 1E-1F 27-2B 48 59 60-61 64-65 82-83 95 97 23 02 10 20-21 29-2A 25 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0 AC B2 BA BC C4 CA-CB D8-D9 26 3A-3C 40 42 60 63 65-66 6A-6B FB 01-02 FF FD -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)
At 00:57 +0200 2003-07-18, Philippe Verdy wrote: Why is row 03 so resticted? Shouldn't it include those accents and diacritics that are used by other characters once canonically decomposed? Or does it imply that MES-2 is only supposed to use strings if NFC form? Also, is this list under full closure with existing character properties, like NFKD decompositions, and case mappings? The MES-2 is what it is, and was developed at the time when it was. It is thought to be a minumum requirement for European requirements, and is certainly a lot better than that old Adobe glyph list that was supported earlier on. It doesn't depend on very smart fonts. Personally I prefer the Multilingual European Subset. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)
At 12:16 +0200 2003-07-18, Philippe Verdy wrote: Is there some work at CEN to align its MES-2 subset into a revized (MES-2.1 ???) which not only takes into consideration the ISO10646 reference but also its Unicode properties to make this set self-closed, and actually implementable, at least with NFC closure and case-mappings closure? No. The relevant CEN committee is now dormant. I still note that modern Hebrew and Arabic are excluded from MES-2, as they are not used in any official language in the European Union or EFTA, or future EU candidates. But They are certainly of great interest for countries with which the EU is a major partner, and which are using these scripts. In some future, it would be needed to include support for modern Georgian (a subset of U+10A0..U+10FF), and modern Armenian (a subset of U+0530..U+058F), as well as some characters from Cyrillic Supplementary (in U+0500..U+052F). The European Multilingual Subset supports all of Latin, Greek, Cyrillic, and Armenian. Unicode supports Hebrew and Arabic. On the opposite, I don't understand why MES-2 included characters in row U+25xx (Box Drawing, Block Elements, Geometric Shapes) Legacy compatability with IBM and others. which are not strictly needed for text purpose (notably legal publications of the E.U., which should better use markup systems), and the two Alphabetic Presentation Forms U+FB01..U+FB02 (fi and fl ligatures) which are really unneeded, even for legal purposes, or they should have been coherent and included ff, ffi, ffl ligatures... Legacy compatibility with Apple. I suppose that this may come from widely used legacy encodings in some EU+EFTA+European Council countries, but CEN should have avoided them (they could still be selected by font renderers, if available in fonts). You are entitled to your opinion. This work was begun and finished long ago. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)
At 13:35 +0200 2003-07-18, Philippe Verdy wrote: I note that you prefer the European Multilingual Subset to MES-2. Is it an extended set that includes MES-2, and fills the holes by using all characters defined in blocks of some version of the Unicode set? It is script-based, not character based. It includes all Latin, Greek, Cyrillic, Georgian, and Armenian characters. And is a superset of MES-2. I *prefer* Unicode to any subset thereof. -- Michael Everson * * Everson Typography * * http://www.evertype.com
I am not in India
Colleagues, Apparently some of you have got copies of mail I wrote in December 2002 entitled Coptic II? which has some virus attachment to it. This has been sent by [EMAIL PROTECTED] which is not me, and I didn't send it, and I use Mac OS X and Eudora so I don't have a virus. Thanks. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Thank you. (was Re: [Private Use Area] Audio Description,Subtitle, Signing)
At 12:11 +0100 2003-07-18, William Overington wrote: Thank you for the list of code points for MES-2. I have already found that the DVB-MHP minimum set does not have some of them and that the DVB-MHP minimum set does have some which are not in MES-2, such as U+1EB0 to U+1EB5. If this is of interest to CENELEC, feel free to tell them. -- Michael Everson * * Everson Typography * * http://www.evertype.com
I am not in India II
Your message has encountered delivery problems to the following recipient(s): [EMAIL PROTECTED] Delivery failed 554 delivery error: dd This user doesn't have a yahoo.co.in account ([EMAIL PROTECTED]) [-5] - mta104.mail.in.yahoo.com See? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
At 11:28 -0400 2003-07-18, John Cowan wrote: However, a font like Last Resort (the world's smallest giant font, as it were) does that just about as well. While I hate seeing the Last Resort font show up, I love seeing it when it does. :-) S much better than ?. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)
At 13:07 +0200 2003-07-18, Kent Karlsson wrote: This is not to say that the MESes are unproblematic. To mention just two points not already mentioned: none of the new math characters are included even in MES-3 (a, b), despite that all math characters were supposed to be included That isn't true. and not even MES-3 covers all official minority languages. What's missing? (But as Philippe states, there are some rather useless characters that have been included for compatibility reasons.) Same goes for Unicode though. :-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: I am not in India II
At 00:44 +0200 2003-07-19, Adam Twardoch wrote: From: Michael Everson [EMAIL PROTECTED] [EMAIL PROTECTED] This merely means that somebody has a virus who had both Michael and Roozbeh in his/her address book. People who believe that e-mails with a particular name in the From field must come from that very person can be called, ehem, naiive. That's an interesting way of writing the diaeresis on naïve, Adam. :-) This particular virus sends itself around, identifying the sender as one of various addresses from the infected person's address book. In addition, the virus swaps the usernames and domains around, so addresses such as [EMAIL PROTECTED] are created. So, basically, it means that the virus probably comes from a person who: 1. Is in Singapore. 2. Has following entries in the address book: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] and [EMAIL PROTECTED] 3. Uses Microsoft Windows. Anybody ring a bell? James Seng? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
At 15:45 -0700 2003-07-18, Michael \(michka\) Kaplan wrote: A question mark is a sign of a bad conversion from Unicode (to a code page that did not contain the character). This would likely happen on the Mac too rather than the Last Resort font, wouldn't it? No, it wouldn't. A not a character glyph is displayed in the Last Resort font. On Windows, the cannot find a font for it situation is the NULL glyph. Not much netter than ? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: I am not in India
At 15:25 +0430 2003-07-19, Roozbeh Pournader wrote: On Sat, 2003-07-19 at 02:46, Doug Ewell wrote: I got something titled Re: Coptic II? (note leading space) from [EMAIL PROTECTED], which I am pretty sure is not Roozbeh Pournader. I definitely now *nothing* about Coptic but that's it's related to Greek to some degree. The Coptic script derives from the Greek script, but the language is Late Egyptian. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)
At 15:23 +0200 2003-07-19, Philippe Verdy wrote: Unicode does not define the charset (which are defined by ISO10646), That isn't true. They both define the same character set. (I will not use the term charset.) but character properties and related algorithms, and (in cooperation with ISO10646) their codepoint assignments. The code position assignments are (formally) assigned by WG2, but there is consensus between UTC and WG2 on this matter. For me, Unicode is NOT a character set, but an encoded character set, with a small but important nuance: You need to specify a version after Unicode to indicate the character set. So Unicode 4.0 is a character set, and a superset of Unicode 3.2, but Unicode alone is not. To me, Unicode refers to the most recent version. :-) If you just look at this definition, you cannot prefer Unicode to any subset, Yes, I can. because Unicode is just a name of a collection of standards and a collection of character sets and algorithms That isn't true. If you think this is true, you really have a lot to learn about Unicode. and already is a subset of the next version... If you cannot support the idea of subsets, then don't use Unicode, or wait that the Unicode standard is definitely closed, or permanently consider that is repertoire is now closed and no more characters will be added... Of course you would be wrong. I think you mistook me. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
At 16:41 -0700 2003-07-18, Michael \(michka\) Kaplan wrote: I am pretty sure you have to be wrong here, Michael. Attend me: 1) API converts from Unicode to the wrong code page 2) API does some sort of work with the string 3) API tries to display the string How on earth could it from the Last Resort font, unless it is a generic glyph that contains no script info (which would be no better than a question mark or a NULL glyph) ? Hm. See http://developer.apple.com/fonts/LastResortFont/ where it shows glyphs for illegal characters (FFFE/ etc.) as well as undefined characters (valid code positions which have not been assigned). I thought somehow that there was a glyph for broken characters (characters that were just plain wrong) as well. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Last Resort Glyphs (was: About the European MES-2 subset)
At 20:24 +0200 2003-07-19, Philippe Verdy wrote: Isn't this page creating the idea for a specific block of script-representative glyphs, that could be mapped in plane 14 as special supplementary characters ? Good heavens, no. It's one thing for me to update this font regularly for Apple when new blocks get added to the standard. It's quite another thing to suggest that we should have to add, formally, a new block symbol to some block in Plane 14 every time we add a new block to the standard. Isn't it? Surely the correct thing to do is to implement Last Resort support for different platforms as Apple indicates using those character names. So fonts containing these glyphs could be designed to display these glyphs, in a way similar to the current assignment of control pictures. Um, that's what the Last Resort font does, outside of Unicode encoding space. (I don't think PUA characters are used, actually, but I could be wrong. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Karen Language Representation in Unicode
I've discussed the matter with Christian and you can write to me about it. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Last Resort Glyphs (was: About the European MES-2 subset)
At 23:34 +0200 2003-07-19, Philippe Verdy wrote: I'm still convinced that these glyphs are much more informative than a default glyph showing a ?, a white rectangle, or a black losange with a mirrored white ?... Of course they are. And Unicode also uses these glyphs in the index page for its charmaps, You mean for its charts. Please. but they are shown as poor bitmaps (may be the PDF or book version use your glyphs in a document-embedded font) That page is in HTML. How were your glyphs contributed? I, uh, drew them. With SVG graphics containing character objects and drawing primitives I have no idea what this means. I used Fontographer. (it seems the simplest way to derive them, using the table shown in Apple's web page, with some exceptions for unassigned, reserved, forbidden or surrogates symbols which require a distinct design)? You can't derive these. You have to draw them individually. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Last Resort Glyphs (was: About the European MES-2 subset)
At 08:20 -0500 2003-07-20, [EMAIL PROTECTED] wrote: What would be the purpose of encoding these? I can't think of any. They certainly don't need to be encoded as distinct characters to use in a Last Resort font. I am certain more people want to interchange the LITTER DUDE than would want to interchange script block indicators. (Ken suggested offline that this name might be better-received than the DO NOT LITTER SIGN) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Last Resort Glyphs (was: About the European MES-2 subset)
At 09:56 -0600 2003-07-20, John H. Jenkins wrote: No, it uses the acutal Unicode characters, and just has a huge cmap that maps everything in Unicode to the glyph for its block. That is just so cool. :-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
[OT] French Government Bans the Term 'E-Mail'
Off-topic, but interesting. This just crossed my desk http://news.yahoo.com/news?tmpl=story2cid=518u=/ap/20030718/ap_on_re_eu/france_out_with__e_mail__3printer=1 -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
At 12:38 -0700 2003-07-20, Peter Kirk wrote: Indeed. Where can I get the Last Resort font for Windows (2000)? If the answer is nowhere, I guess I am stuck with Arial Unicode MS or the horrible-looking (the J always grates!) Code2000. I'll go have a chat with some of my Apple colleagues about this. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
At 20:50 + 2003-07-20, [EMAIL PROTECTED] wrote: At 12:38 -0700 2003-07-20, Peter Kirk wrote: Indeed. Where can I get the Last Resort font for Windows (2000)? If the answer is nowhere, I guess I am stuck with Arial Unicode MS or the horrible-looking (the J always grates!) Code2000. I'll go have a chat with some of my Apple colleagues about this. It's unlikely that your Apple colleagues can do anything for the J in Code2000. I wasn't talking about that, but if you'd like my opinion, I hate that J too. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [OT] French Government Bans the Term 'E-Mail'
At 10:59 -0400 2003-07-21, Patrick Andries wrote: - Message d'origine - De: Michael Everson [EMAIL PROTECTED] At 19:56 -0400 2003-07-20, Patrick Andries wrote: Obviously, the AP has found someone to say it is artificial. Of course, all language is artificial. Well, at least all new words that can be traced to someone can be so « described ». *All* words must be traced to someone. They do not grow on trees. I also wonder if anybody in the US said to the inventor of email or any new word : this is artificial. It seems somewhat nonsensical or at least tautological for any newly coined word. eBook, e-mail, eBay, e-money, and all that gunk. I suppose we could do without them. Even Apple's gone weird about it. I don't know what the i in the iLifestyle suite (iChat, iPhoto, iBook, iThis, iThat) means. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [OT] French Government Bans the Term 'E-Mail'
At 11:41 +0100 2003-07-22, Marion Gunn wrote: I read that 'i' (in the Apple context) as meaning 'i(nternet ready)'. It is possible I could be wrong about that. Am I? Yes, you are. -- ME
Re: Useful identifier for Scripts
At 15:00 -0400 2003-07-24, John Cowan wrote: Markus Scherer scripsit: Note that even for single-language text you may need multiple script identifiers. For example, for Japanese text you will need 3 identifiers for Han+Hiragana+Katakana. Obviously, if you have multilingual text, you will need more. Politely, ISO 15924 supplies a special code for this case. You're welcome. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Damn'd fools
At 15:46 -0400 2003-07-25, John Cowan wrote: When the United Kingdom hands back Northern Ireland to Ireland in 2052, then obviously the numeric codes of both countries will have to change, but not the codes for the names. Presumably the name of the U.K. would change, however. Why? It would be the United Kingdom of Great Britain, which comprises England, Scotland, Wales, and the Duchy of Cornwall. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Damn'd fools
At 04:58 -0700 2003-07-28, Peter Kirk wrote: On 28/07/2003 04:31, Michael Everson wrote: The Normans of course were frankified Norsemen. (My word. Apparently francized would be used in Québec; frencify occurs but is apparently often derog..) Thanks, Michael. Of course I could have suggested to Jarkko to ask an English speaking Irish person is he or she is English. Perhaps we are Hiberno-Saxons (Ducks.) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: OT: Damn'd fools
At 11:47 -0700 2003-07-28, Peter Kirk wrote: So if Finland was part of Russia, Canada is part of England. How do you like that one, Karljürgen? Should I expect an imminent French (Canadian) invasion? I thought Québec wanted to join the EU (Ducks again.) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Yerushala(y)im - or Biblical Hebrew
At 13:22 -0700 2003-07-28, Kenneth Whistler wrote: Because changing the canonical ordering classes (in ways not allowed by the stability policies) breaks the normalization *algorithm* and the expected test results it is tested against. Do you really think that algorithm with all its warts is going to be used 50 years from now? I really would like to know. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Yerushala(y)im - or Biblical Hebrew
At 15:47 -0700 2003-07-28, Peter Kirk wrote: Well, except two countries, or more than two if you have been following the damn'd fools thread. We British resisted Napoleon and we continue to resist his innovations like the metric system, though we are being forced to make a gradual change. Thank heavens. :-) Unless you miss non-decimal currency. There are still many things which Napoleon managed to impose and are still uniform all the way from Calais to Vladivostok (because even the Russians accepted his system for a while), even traffic rules (drive on the right, give way to the right), but are different in the UK. That doesn't mean it's a good idea that these things aren't standardized. Though I like the fused UK and Irish electric socket plugs, which are extremely safe -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Back to Hebrew, was OT:darn'd fools
At 07:31 -0700 2003-07-29, Peter Kirk wrote: I don't think you French Canadians would be very happy if accented upper case vowels were removed from Unicode because they are not used in France. This isn't true. They *are* used in France. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: French accents on uppercase, was Back to Hebrew, wasOT:darn'd fools
At 11:47 -0400 2003-07-29, Karljürgen Feuerherm wrote: I believe they're optional though, at least, aren't they? Not in good typography. You must unlearn what you have learned -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Back to Hebrew, was OT:darn'd fools
At 11:52 -0400 2003-07-29, Jim Allan wrote: One the other hand, dropping diacritics from names or text written in all uppercase is considered acceptable in Quebec French (and I suspect also in France) dating from old addressograph technology and billing typewriter technology where capital letters alone were available and diacritics were not normally included as part of the character set. Then you have the old problem: what does « LE PRESIDENT ASSASSINE » mean if such a practice is employed? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Back to Hebrew, was OT:darn'd fools
At 08:47 -0700 2003-07-29, Peter Kirk wrote: Another example might be German ß (U+00DF). Many people don't use it, indeed I think it has been officially abolished, but many others do use it. Peter, there isn't a shred of truth in what you are saying. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Back to Hebrew, was OT:darn'd fools
At 10:36 -0700 2003-07-29, Peter Kirk wrote: The only shred of untruth is that what I said I think is true is in fact an exaggeration, the abolition is only partial. Hence it was not officially abolished. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Back to Hebrew - Vav Holam
At 22:21 +0200 2003-07-29, Jony Rosenne wrote: With Hebrew, it is not accepted that it is a different Vav - letters used as matres lectionis are not distinct from the same letters used otherwise. Neither is it accepted that this is a different Holam. The only thing established is that this artifact has been used in several manuscripts, one of many similar artifacts, to aid the understanding of the text. And the correct vehicle to convey such artifacts is markup. Ink dots used to aid the understanding of the text are always encoded as characters. Markup is the wrong way to handle them. Otherwise we would write Karljfrontedu/frontedrgen or the like. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Back to Hebrew - Vav Holam
At 15:41 -0500 2003-07-29, [EMAIL PROTECTED] wrote: Jony Rosenne wrote on 07/29/2003 03:21:08 PM: The only thing established is that this artifact has been used in several manuscripts, one of many similar artifacts, to aid the understanding of the text. And the correct vehicle to convey such artifacts is markup. You say this as if it's objective truth. Now, if I see Latin-script text with a diacritic comma above in some places but also a comma above and a little to the right, the correct vehicle to convey these artifacts is the pair of distinct characters, U+0313 COMBINING COMMA ABOVE and U+0315 COMBINING COMMA ABOVE RIGHT. Apparently, in the case of Latin, it was not considered an objective truth that the correct vehicle is markup. If it comes to having an above-Hebrew-thingy and a next-to-Hebrew-thingy or having it be done by markup, I certainly would prefer the character-based solution. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Hebrew Vav Holam
At 21:29 +0200 2003-07-30, Jony Rosenne wrote: Problem: We have here one character sequence with two alternate renditions: the common rendition, in which they are the same, and a distinguished rendition which uses two separate glyphs for the separate meanings. On paper, which is two-dimensional, it is a Vav with a Holam point somewhere above it. Unicode decided that in the encoding, which is one-dimensional, the marks follow the base character. Any solution should accommodate both kinds of users and both renditions. Solution: Suggestions, please. Please put this in a document with an actual illustration of the problem. I don't follow it from the verbal description. In Tengwar, tinco with a three-dot diacritic over it can be read [ta] or [at] depending on the language. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Hebrew Vav Holam
At 16:50 -0400 2003-07-30, John Cowan wrote: Michael Everson scripsit: See the reference glyph for U+FB4B. One form looks like this with the dot above further to the left, the other like it with the dot a little further to the right. This glyph with the centred dot is a compromise between the two. A picture speaks a thousand words. These particular words combined with the picture in the U3.0 chart tell all. I see. This disunification tempts. I'd go to the bother of writing up the proposal for adding this combining character if on further discussion it appears the right thing to do. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Hebrew Vav Holam
At 13:12 -0400 2003-07-31, Ted Hopp wrote: For reasons I posted earlier, I don't think encoding the dot is the right approach. I despair of following this thread. I'd propose something that would look like this in the UCD (with 'nn' to be determined, but it should be in the Hebrew block): 05nn;HEBREW VOWEL HOLAM MALE;Lo;0;R;compat 05D5 05B9N; We do not encode any HEBREW VOWELs. We encode LETTERs and combining marks. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Hebrew Vav Holam
At 21:57 +0200 2003-07-31, Jony Rosenne wrote: I was under the impression that old English manuscripts did use different glyphs for the two sounds of th. Thorn and eth. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Hebrew Vav Holam
At 16:18 -0400 2003-07-31, Ted Hopp wrote: On Thursday, July 31, 2003 3:03 PM, Michael Everson wrote: We do not encode any HEBREW VOWELs. We encode LETTERs and combining marks. I agree with the do not if it's descriptive of current practice. If it's prescriptive, I'd have to ask why. (And please don't say stability policy! :)) The Name Police like consistency. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Conflicting principles
At 16:16 -0400 2003-08-06, John Cowan wrote: I would like to ask the old farts^W^Wrespected elders of the UTC which principle they consider more important, abstractly speaking: the principle that combining marks always follow their base characters (a typographical principle), or that text is stored, with a few minor exceptions, in phonetic order (a lexicographical principle). Are you thinking of the Tengwar? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Conflicting principles
At 15:18 -0700 2003-08-06, Kenneth Whistler wrote: As someone or other said, I believe that hitherto -- *hitherto,* mark you -- [we have] entirely overlooked the existence of, well, scripts that might cause a conflict between these esteemed principles. The reason why the UTC should tackle the encoding of Tengwar is not so much because it would help in the publication of Elvish poetry, but because confronting the architectural issues it poses for encoding would make an excellent tutorial case for how the two principles of combining mark order and logical order impact the task of coming up with an appropriate encoding for a complex script. And it would starkly illustrate the fact that an appropriate character encoding does not necessarily directly reflect the phonological structure of a language as represented by that script. Some rather old discussion papers on this topic may be found at http://www.evertype.com/standards/iso10646/pdf/tengwar-vowels.pdf and http://www.evertype.com/standards/iso10646/pdf/tengwar.pdf It *is* a problem. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Conflicting principles
At 23:07 +0200 2003-08-07, Kent Karlsson wrote: Kent Karlsson scripsit: 4) Encode the vowel signs as combining characters, after the base characters they logical follow. Consider them as double [width] combining characters, that happen to have no ink above/below the character they apply to, but (like double width combining characters) have ink over/under the glyph for the base character that follows. Kent. Read my papers. A similar approach is proposed. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Questions on ZWNBS - for line initial holam plus alef
At 01:11 +0200 2003-08-09, Philippe Verdy wrote: I just picked SYMBOL to just match the required property that would match other spacing variants of diacritics. The ZERO WIDTH is probably confusive, but it just marks the fact that it has no associated glyph and a null *minimum* width (which expands to the largest diacritic(s) with which it is combined). The Name Police reject this utterly. ZERO WIDTH cannot have an expanding dynamic width. This pseudo-character will not be encoded. Time to drop the thread. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Newbie Question - what are all those duplicated charactersFOR?
At 17:46 +0100 2003-08-08, [EMAIL PROTECTED] wrote: I'm reasonably sure that this question reflects my own ignorance, rather than some problem with the standard, but nonetheless, I am confused. Read the text. Don't just read the code charts. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic scripts
At 09:00 +0100 2003-08-09, Raymond Mercier wrote: There are omissions in Michael Everson's chart in http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf The chart was based on Semitic languages, although purporting to be about scripts. No, it wasn't. There are less obvious omissions: 1. Kharoshthi, a RtoL script much used in North West India, and regarded by everyone as a derivative from a form of the Aramaic script used in that region. It is found on coins, Ashokan edicts, various inscriptions and manuscripts. It was used to write mainly prakrits, although some sanskrit text is known. See, for example, A.H. Dani, Indian Palaeography, Oxford 1963. We are well aware of Kharoshti, which was roadmapped without any difficulty. 2. Pahlavi, widely used to write Middle Persian. This involved a troublesome mixture of Persian reading of Aramaic words, a subject requiring more elaboration than is needed here. We are well aware of Avestan and Pahlavi, which were roadmapped without any difficulty. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Questions on ZWNBS - for line initial holam plus alef
At 10:58 -0700 2003-08-11, Peter Kirk wrote: On 11/08/2003 06:59, Jon Hanna wrote: There are only two theoretical problems that I can see here, the first is that a whitespace character other than space gets converted to space by attribute value normalisation, and that this changes the meaning of the text in some way. This could only occur if the combining character were the first character in a line of text, which is quite a nonsensical construct to begin with. Not at all! Imagine a tutorial on a language, which might well list the accents used, in a format like this: ` (grave accent) is used with a, e and o, and indicates more open pronunciation ^ (circumflex accent) is used with any vowel, and indicates lengthening So far so good, but when I get to an accent with no predefined spacing variant, I have a problem! It has been explained the mechanism for doing this, and it has been explained that if it is not implemented correctly you should yell at the implementors. In Mac OS X, for instance, the horizontal spacing seems to work all right for many accents, but they seem to prefer to rest just above the baseline. I'll report this as a rendering bug. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Colourful scripts and Aramaic
At 18:03 -0400 2003-08-07, Karljürgen Feuerherm wrote: My knowledge of Aramaic script is a little scanty, but my understanding is more or less the same as Peter's. Which leads me to suggest that encoding Aramaic separately would be a bit like encoding Old Akkadian (Cuneiform) separately from NeoAssyrian (Cuneiform). Which would be a bit silly (and not what we are planning in that arena) Note that some people are even willing to argue that the substrate languages might be considered distinct, too--in case that is the argument which would be applied to Aramaic. We do not encode languages. Would somebody please read http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf before deciding what it is that is meant by Aramaic in the Roadmap? Note that Hebrew descends FROM it, and that as do number of other scripts which clearly do NOT descend from Hebrew. Unicode encodes Square Hebrew. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Handwritten EURO sign
At 23:35 +0200 2003-08-05, Pim Blokland wrote: I have absolutely no idea what you are talking about. You are lucky not having to put up with bad English like five euro and six cent, living in the Netherlands and speaking Dutch as you do. See http://www.evertype.com/standards/euro if you wish to learn more about a disaster in language planning. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Colourful scripts and Aramaic
At 13:12 -0700 2003-08-07, Peter Kirk wrote: Well, it seems to me that in the case of the Aramaic proposal we don't even have that. We have an archaic version of the script which is now used mainly for Hebrew, and which many scholars still call Aramaic (in distinction from paleo-Hebrew) although Unicode calls it Hebrew. The Aramaic glyphs are almost all recognisably the same as or slight variants on the Hebrew ones. And Hebrew script is already used, uncontroversially, for large corpora of Aramaic e.g. in the Talmud. Why a new script for the few surviving examples of ancient Aramaic in this script? People. It's the widespread offshoot used throughout the Middle East that spawned Brahmic and Uighur and other scripts. It isn't necessarily the thing you think is confined to three scraps of papyrus or whatever. We aren't working actively on this now. We don't have an active proposal. We have something roadmapped, and I for one don't want to spend time right now defending its roadmapping to you apart from what is in my earlier paper on Semitic scripts. Could you turn off the fire alarms? -- Michael Everson * * Everson Typography * * http://www.evertype.com
[hebrew] Re: Roadmap---Mandaic, Early Aramaic, Samaritan
Elaine, I really, really, really don't have time to debug your dissatisfaction with the use of the word Aramaic in the Roadmaps. This is NOT something anyone is working actively on right now. When a proposal comes forth, there will be evidence in it that can be picked at. In actuality, one could make a very good case that all extant Semitic/ extended Aramaic-Moabite-Amorite-Yaudic-Hebrew etc. type alphabetic scripts between the earliestSinaitic / Wadi El-Hol---and middle Parthian are font variants We are not going to encode Phoenican and Samaritan and Palymrene as font variants of Hebrew. If you want to write those languages in Hebrew script, do so. Any border(s) you draw will be either completely artificial or mostly artifical. That's the problem. The borders we draw are based on the analyses of script experts. I gather that you are a font person, fascinated by the aesthetic pleasure of wondrous shapes. I am a lot more than that. I am a database person, concerned with minimizing unnecessary font variation, which may interfere with future overworked Semitic retrieval engines. You will never be at as greater disadvantage than a Sanskritist is, considering that the Rg Veda can be written in a dozen or so scripts. The Mandaic and Samaritan scripts apparently enjoy at least some modern liturgical use. Yes, they do! But the Samaritan is also heavily used within Jewish studies / Biblical studies communities. The Samaritans also use their shapes in private correspondence. Then we shall encode them. of Aramaic script to encode has not been looked at carefully. Indeed we have no current proposals which are well-advanced at this time. I'm responding now because this may be the only time period where Hebraists interact with UnicodeCarpe diem.. Hebraists are discussing concerns about METEG and things. You're responding about things which don't even have formal proposals to respond to. If you want me to start working on encoding other early Semitic scripts, please give generously to the Script Encoding Initiative and ask for prioritization. Failing that, I will be working on things which have higher priority (and more complete proposals) at present, like Coptic, Saurashtra, Nuskhuri, Buginese, N'Ko, Ol Chiki, Avestan and Pahlavi, and so on. I am responding at great length to the Roadmap proposals for the Semitic dialects Mandaic, Early Aramaic, and Samaritan. We are proposing to encode scripts, not languages. Yes, that is your take on it. But scripts are frozen language, not the liquid language of speech or the gaseous language of poetry.. You encode scripts so we can manipulate languages We encode scripts so that we can represent texts. And we will do it, as we have, to the best of our ability, but not by lumping everything together just because it makes things easy for database programmers. Best regards, -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Conflicting principles
At 01:18 +0200 2003-08-09, Philippe Verdy wrote: Such break in a middle of a multiple width diacritic exist in some notations, and are not considered horrible typography. Just look at musical notations where a upper horizontal parenthesis is used to group some elements [...] Music setting is not typesetting, and that kind of music representation is outside of the scope of the Unicode Standard. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Display of Isolated Nonspacing Marks (was Re: Questions onZWNBS...)
At 01:30 +0200 2003-08-10, Philippe Verdy wrote: Whateer you think, the SPACE+diacritic is still a hack, and certainly not a canonical equivalent (including for its properties), of the existing spacing diacritics, which also do not fit all usages because they are symbols. It is the formally specified way to represent what you say you want to represent. If an implementation doesn't do that nicely enough, complain to the implementors. (This has already been suggested to you.) -- Michael Everson * * Everson Typography * * http://www.evertype.com