from:"Michael Everson"

Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-25 Thread Michael Everson

At 12:15 -0700 2003-06-25, John Hudson wrote:

In this case, any existing normalisation for Hebrew is already 
broken -- in the sense of destroying Biblical Hebrew text -- but 
still the argument from the UTC seems to be that even broken 
implementations -- broken because the standard is broken -- must not 
be broken.
That seems very short-sighted indeed.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-25 Thread Michael Everson

At 14:20 -0700 2003-06-25, John Hudson wrote:

John,

Write it up with glyphs and minimal pairs and people will see the 
problem, if any. Or propose some solution. (That isn't add duplicate 
characters.)

In Biblical Hebrew, it is possible for more than one vowel to be 
attached to a single consonant. This means that is it very important 
to maintain the ordering of vowels applied to a single consonant. 
The Unicode Standard assigns an individual combining class to every 
vowel, meaning that NFC normalisation may re-order vowels on a 
consonant. This is not simply 'non-traditional' but results in 
incorrect rendering and a different vocalisation of the text. The 
point is that hiriq before patah is *not* canonically equivalent to 
patah before hiriq, except in the erroneous assumption of the 
Unicode Standard: the order of vowels makes words sound different 
and mean different things.

In order to correctly encode and render the Biblical Hebrew text, it 
is necessary to either a) never use normalisation routines that 
re-order marks (which is beyond the control of document authors), or 
b) re-classify the existing Hebrew marks so that all vowels are in a 
single class and will not be re-ordered during normalisation, or c) 
encode new marks for Biblical Hebrew with all vowels in a single 
class.

There are a few other desirable changes to the combining class 
assignments for some Hebrew accents, which make rendering easier and 
are more linguistically logical, but the vowels are the most 
problematic.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]
If you browse in the shelves that, in American bookstores,
are labeled New Age, you can find there even Saint Augustine,
who, as far as I know, was not a fascist. But combining Saint
Augustine and Stonehenge -- that is a symptom of Ur-Fascism.
- Umberto Eco


--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Revised N2586R

2003-06-26 Thread Michael Everson

At 13:03 +0100 2003-06-26, William Overington wrote:

Well, certainly authority would be needed, yet I am suggesting that where a
few characters added into an established block are accepted, which is what
is claimed for these characters, there should be a faster route than having
to wait for bulk release in Unicode 4.1.
No, there shouldn't. The process will not be changed. Unicode and 
ISO/IEC 10646 are synchronized, and JTC1 ballotting processes are 
what they are. No further discussion is necessary, as it is pointless.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Revised N2586R

2003-06-26 Thread Michael Everson

At 12:09 -0500 2003-06-26, [EMAIL PROTECTED] wrote:

The only meaning that the Standard implies is that the character encoded
at codepoint x represents they symbol of a wheelchair. It does not imply
*anything* about how its usage in juxtaposition with the name of a person
should be interpreted.
Indeed William's argument that HANDICAPPED is somehow inappropriate 
just doesn't wash. In Europe at least, many handicapped people 
consider it far more polite to be called handicapped or behindert or 
what have you than to be subject to such politically correct 
monstrosities as differently abled.

Which is not to say that the Name Police won't prefer WHEELCHAIR 
SYMBOL. Time will tell.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Nightmares

2003-06-26 Thread Michael Everson

At 14:32 -0400 2003-06-26, John Cowan wrote:

If you are going to discriminate (invidiously) using a computerized
database, using H for Handicapped (or G for Gimp) will do just as well.
Are you going to complain about the various symbols of religion already
encoded on the same grounds?
I am preparing additional religious symbols to help fill the gaps.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

2003-06-26 Thread Michael Everson

At 15:36 -0700 2003-06-26, Kenneth Whistler wrote:

I now like better the suggestions of RLM or WJ for this.
ZZZT. Thank you for playing.

RLM is for forcing the right behaviour for stops and parentheses and 
question marks and so on. Introducing it between two combining 
characters in Hebrew text would break all kinds of things, and would 
be horrible, horrible, horrible. Invent a new control character for 
this weird property-killer, if you must, but don't use an ordering 
mark for it.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Biblical Hebrew

2003-06-27 Thread Michael Everson

At 23:59 -0700 2003-06-26, John Hudson wrote:

I think there is a reasonable case to be made for treating modern 
Hebrew and Biblical Hebrew as separate languages for pretty much all 
purposes. The existing codepoints with the fixed position combining 
classes work fine for Modern Hebrew, and there's no reason that they 
should not continue to be used for that language. I would seriously 
entertain the idea of re-encoding *all* the Hebrew marks, along with 
non-Tiberian vocalisation marks and anything else specifically 
needed for Biblical Hebrew, in a separate block, and deprecate the 
cantillation marks in the Hebrew block.
Speaking as a member of WG2, I do not think that we should encode 
such duplicate characters.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Yerushala(y)im - Biblical Hebrew

2003-06-27 Thread Michael Everson

At 10:09 +0200 2003-06-27, Jony Rosenne wrote:
Whatever you do, any new characters designed for solving these problems
should not be in the Hebrew block. Add a new Biblical Hebrew block, clearly
labeled as not intended for regular Hebrew use.
And I suggest that whenever a proposal comes up to the UTC, it would be
advantageous to involve Israeli Biblical scientists in the review.
We've wanted *that* for a long time. Indeed it is a long-standing 
request that Israeli experts help to map the TC46 8-bit standard with 
cantillation marks to Unicode. Can you help facilitate this?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Biblical Hebrew

2003-06-27 Thread Michael Everson

At 04:22 -0500 2003-06-27, [EMAIL PROTECTED] wrote:

In discussing these issues among Biblical Hebrew implementers, 
content  providers and users, I have had to explain repeatedly why 
UTC doesn't want  to consider this. It is completely obvious to them 
that this is the right  solution. Even on explaining the impact on 
normalization, the response is  that there is no impact since 
implementations and content using Unicode do  not yet exist.
Indeed, but the UTC doesn't want to change the normalization stuff 
even where there are obvious errors, for philosophic reasons, I 
suppose. I mean who are all the implementors who depend on these 
tables? Often Unicoders have claimed existing implementations even 
where none can be shown to exist. Now Ken tempts us with:

This is just one more in the accumulating pile of little problems in 
the decompositions locked down by normalization that will eventually 
result in the committee going Spaaannggg! and agreeing to publish 
and maintain a separate, corrected list of equivalences As She 
Oughta Been which are not constrained by the formal stability 
guarantees of UAX #15 normalization forms.

I'd like to understand how deprecating a character and adding a 
duplicate one with the right properties differs from deprecating a 
version of UAX #15 in favour of an Oughta-Been table.

:-)

I think it would be better to create a new character for this purpose than
to use ZWJ in yet another way.
I suppose CGJ is tempting.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

2003-06-27 Thread Michael Everson

At 04:22 -0500 2003-06-27, [EMAIL PROTECTED] wrote:

I just have a hard time believing that 50 years from now our 
grandchildren  won't look back, What were they thinking? So it took 
them a couple of  years to figure out canonical ordering and 
normalization; why on earth  didn't they work that out first before 
setting things in stone, rather  than saddling us with this 
hodgepodge of ad hoc workarounds? How short  sighted. As Rick said, 
I know this will get shot down; don't bother  telling me so.
I agree with you, Peter.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

2003-06-27 Thread Michael Everson

At 04:22 -0500 2003-06-27, [EMAIL PROTECTED] wrote:

Are we saying that ISO doesn't give a rip for implementation issues?
Duplication of characters is not the way to fix (forgive me, UTC) 
*Unicode's* error in combining characters.

Or that their notion of ordering distinctions is different from 
Unicode's  such that *any* differently ordering permutation of some 
given set of  characters is considered a distinct representation? 
Are we saying that the  voting members of WG2 are not already aware 
of the issue that has been  discussed and incapable of understanding 
an explanation of these issues  addressed to them?
You might submit your paper to WG2.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

2003-06-27 Thread Michael Everson

At 04:53 -0500 2003-06-27, [EMAIL PROTECTED] wrote:

If they're so unaware of combining classes, might it not seem 
reasonable to think the the dialog might continue as follows?

- [gives explanation of combining classes and the related problem for Hebrew]
ISO: So, you're saying you're coming to us asking for duplicates of 
existing characters because of an error the Unicode Consortium made 
with some of those character properties they define?
- Well, yes, that's basically it.
ISO: Then, obviously they need to correct their errors. I mean, it's 
not like the wrong characters got encoded or something. Tell them to 
just fix the errors; that can't be difficult to do, and is obviously 
the right thing to do.
This is exactly my view.

Who is it who will kill the Unicode Consortium if UAX #15 were to be 
revised? Did it occur to anyone to *ask* about the possible revision 
of classes for the dozen or so instances that would be affected?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: [cowan: Re: Biblical Hebrew (Was: Major Defect in CombiningClasses of Tibetan Vowels)]

2003-06-27 Thread Michael Everson

At 14:34 +0200 2003-06-27, Philippe Verdy wrote:
On Friday, June 27, 2003 1:29 PM, John Cowan [EMAIL PROTECTED] wrote:
 Michael Everson scripsit:
 Change the character classes in Unicode 4.1, and they *might* decide
 to freeze support at, say, Unicode 3.0.
Or they may simply opt to define their *OWN* normalization standard, 
distinct from Unicode NF* form, and designated in a separate 
reference document, removing *all* references to UAX#15 from XML and 
IDNA references, only to guarantee this stability that Unicode would 
be unable to offer.

Let's not this happen!
Oh, come on. Let's not put words in people's mouths. Ifs and mights 
are not facts.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Biblical Hebrew (Was: Major Defect in Combining Classes ofTibetan Vowels)

2003-06-27 Thread Michael Everson

At 07:28 -0400 2003-06-27, John Cowan wrote:
Michael Everson scripsit:

 Who is it who will kill the Unicode Consortium if UAX #15 were to be
 revised? Did it occur to anyone to *ask* about the possible revision
 of classes for the dozen or so instances that would be affected?
The IETF, for one.  IETF is already very wary of Unicode, even 
though they recognize the practical necessity of using it, but with 
the existing stability guarantees about normalization, they have 
managed to swallow it. Stability *even if wrong* is really, really 
important to protocol people -- just think of all the nonfunctional 
stubs in the world of *diplomatic* protocol, maintained in the name 
of not changing anything.
So, you're saying, no one has asked IETF whether or not they would be 
able to countenance a dozen or so changes for unimplemented things 
like biblical accents.

The W3C would also hit the roof if Unicode normalization changed radically.
I don't think anyone is proposing a *radical* change.

Neither party is at all happy with even the four (I think) 
characters that have already changed, and are already beginning to 
turn into optimistic pessimists (people who smile brightly, nod 
their heads, and say happily, See, things are every bit as bad as I 
predicted!).
Well, y'all are gonna have to do something, and adding duplicate 
characters to ISO/IEC 10646 is not going to be well-received, because 
there isn't anything broken in ISO/IEC 10646.

Since the use of non-ASCII characters in things like XML and the DNS 
depends on the good will of these folks, it is very very dangerous 
to alienate them, and *they do not care* whether the case is a 
corner case or not -- _stare decisis_ is everything to them, the 
actual details little or nothing.
You could explain the problem with these Hebrew accents, and ask them 
to help by accepting a change. Shivering in a cave for fear of the 
monsters outside isn't going to get anyone anywhere. People of good 
will can often come to enlightened consensus.

Change the character classes in Unicode 4.1, and they *might* decide to
freeze support at, say, Unicode 3.0.
Or they might understand the problem. People aren't all *that* 
stupid, methinks.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: [cowan: Re: Biblical Hebrew (Was: Major Defect in CombiningClasses of Tibetan Vowels)]

2003-06-27 Thread Michael Everson

At 09:16 -0400 2003-06-27, John Cowan wrote:
Michael Everson scripsit:

 Oh, come on. Let's not put words in people's mouths. Ifs and mights
 are not facts.
Expressed attitudes are facts, and it's reasonable to extrapolate people's
future behaviors, at least the general trend thereof, from their expressed
attitudes.  When someone draws a line in the sand, it's not unreasonable
to expect that crossing it will be taken as a declaration of war.
But you might trot on over with a white flag to parley about a problem.

They're only human beings over there, just as we are over here.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

2003-06-27 Thread Michael Everson

At 10:40 -0400 2003-06-27, John Cowan wrote:
Karljürgen Feuerherm scripsit:

 1. Everyone is more or less agreed that the present combining class rules as
 they apply to BH contain mistakes. The clearly preferential way to deal with
 mistakes in any technological/computing software environment is to FIX them.
Not so.  Sometimes stability is more important than correctness.
And sometimes not, then. What four characters have been corrected so 
far? Were they important characters to some company? Are there no 
Christians or Jews in the IETF who might care about a problem like 
this, where a simple solution might be effected? Particularly if it 
involves only a handful of characters, and the precedent for making 
such corrections has been set? Or is our standard, which as I have 
said many times, will be used for CENTURIES, going to be hobbled by 
silliness like this forever? Hm?

The use of the backslash character in DOS/Windows systems as a path 
separator is arguably a mistake (paths were borrowed from Unix into 
DOS 2.0, but the
slash was already in use for command-line options, something inherited from
CP/M and the ancestral CLI running back through DEC operating systems),
but fixing it is out of the question.
This is not analogous to the present situation, it seems to me. In 
the first place, what else is the \ for? :-) No one who wants to use 
the \ is prevented from doing so except maybe in filenames, in 
systems which don't allow it. (The colon is disallowed in Apple 
filenames.)

All concerns involving human beings -- ho bios politikos -- are political
in some sense.
And some have more sense than others, it seems. (Sorry, couldn't resist.)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Accented ij ligatures (was: Unicode Public Review Issuesupdate)

2003-06-30 Thread Michael Everson

I think the answer is, regarding the soft dot property, please leave 
the ij ligature alone.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: French group separators

2003-07-07 Thread Michael Everson

At 13:29 -0400 2003-07-07, Frank da Cruz wrote:

Nobody is springing to the defense of this so I'll only say that 
it's a time-honored practice and we shouldn't be so quick to 
disparage it, lest we be disparaged several years hence for the 
things we do :-)
It's rotten, and when I typeset books 
(http://www.evertype.com/books.html) I always have to clean up the 
text which is invariably littered with these artifacts of old 
technology.

In the world of plain text, two spaces after a sentence-ending 
period, exclamation mark, question mark, or other mark is actually 
rather handy to distinguish sentence enders from the same marks used 
in other ways, esp. periods in abbreviations.
Fie! Fie! Unclean! Unclean!
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: French group separators

2003-07-07 Thread Michael Everson

At 14:27 -0400 2003-07-07, Frank da Cruz wrote:

EMACS aside, it's still an interesting question why -- in English at 
least -- it was customary thoughout the 20th century to put two 
spaces after a period
when typing.  I expect it must have been an aesthetic decision.  What else
could it have been?
The typing habit was designed to assist typesetters in reading the 
manuscript as they were setting type. Traditionally, the typesetters 
never set the extra space.

Sigh. This discussion reminds me of way back to 1984 or 1985, when 
The Mac is not a typewriter was published. Same story.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: When is a character a currency sign?

2003-07-07 Thread Michael Everson

At 15:03 -0400 2003-07-07, Tex Texin wrote:

When is a character properly called a currency sign?
Hunh? When you use it to represent currency. DM was two characters 
used as a character sign in Germany.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: French group separators

2003-07-07 Thread Michael Everson

At 15:12 -0400 2003-07-07, John Cowan wrote:
Michael Everson scripsit:

 The typing habit was designed to assist typesetters in reading the
 manuscript as they were setting type.
Either this says that double-spacing after a sentence improves the readability
of monospaced documents, or I misunderstand you entirely.
It assists the printer. In such a context it has a specific utility.

After all, typists are (or were) taught to do so in all sorts of 
documents, including those like business letters that were not to be 
typeset.
Typists were taught to do it generally, but the origin of the 
practice is to assist the typesetters.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: French group separators

2003-07-07 Thread Michael Everson

From Robert Bringhurst's Elements of Typographic Style, pp. 28-20:

Use a single word space between sentences. In the nineteenth 
century, which was a dark and inflationary age in typography and type 
design, many compositors were encouraged to stuff extra space between 
sentences. Generations of twentieth-century typists were then taught 
to do the same, by hitting the spacebar twice after every period. 
Your typing as well as your typesetting will benefit from unlearning 
this quaint Victorian habit. As a general rule, no more than a single 
space is required after a period, or any other mark of punctuation. 
Larger spaces (e.g., en spaces) are *themselves* punctuation.

The rule is usually altered, however, when setting classical Latin 
and Greek, romanized Sanskrit, phonetics, or other kinds of texts in 
which sentences begin with lowercase letters. In the absence of a 
capital, a full *en space* (M/2) between sentences will generally be 
welcome.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: French group separators

2003-07-07 Thread Michael Everson

At 18:08 -0400 2003-07-07, Frank da Cruz wrote:
  It is worth noting that what is described here is the default 
running mode of
 Emacs for the English locale. There are a lot more modes on Emacs to
 handle various languages (including programming languages).
Of course.  But without two spaces you have greater ambiguity, at least in
English: In Mr. Roberts, what is the function of the period?
  Don't call me Mr. Roberts is my name.

  Don't call me Mr.  Roberts is my name.
In European English Mr is generally not followed by a full stop, 
because the abbreviation contains the first and last letter of the 
word. (In Finland that would be M:r.)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: French group separators

2003-07-07 Thread Michael Everson

At 16:22 -0600 2003-07-07, John H. Jenkins wrote:

IIRC the English prefer to say Mr Roberts.
The, ahem, Irish too. ;-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: French group separators

2003-07-07 Thread Michael Everson

At 01:10 +0200 2003-07-08, Philippe Verdy wrote:

I forgot to ask something: is there a Unicode codepoint assigned to 
the abbreviation dot (a narrower dot with less margins on left and 
right than the standard dot), as it seems to be used in some 
typesetted texts to differentiate it from the punctuation mark for 
end of sentence ?
I am sure there is not. Sometimes a dot is just a dot.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: French group separators

2003-07-07 Thread Michael Everson

At 17:00 -0600 2003-07-07, John H. Jenkins wrote:

IIRC the English prefer to say Mr Roberts.
The, ahem, Irish too. ;-)
Well, to be frank, I'm sure that the Welsh, Scots, and Manx probably 
do, too.  (Did I leave anybody out *this* time?)
The Cornish, of course. :-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-12 Thread Michael Everson

At 03:25 -0700 2003-07-12, Peter Kirk wrote:

Does anyone know of a good resource on the web, or elsewhere, 
listing the alphabets used for different languages around the world? 
I know a project was attempted a few years ago at least for Europe. 
It would be useful to have this kind of data available somewhere 
even with no official status.
http://www.evertype.com/alphabets
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish andAzeri, was: Accented ij ligatures)

2003-07-12 Thread Michael Everson

At 08:11 -0400 2003-07-12, Patrick Andries wrote:

Just out of curiosity, why was « iw » deprecated ? Seems perfectly fine to
me. And why was « he » chosen (Herero, Hemba, Hellenic Greek) ?
Iwrit (iw), being a German transliteration of the name of the Hebrew 
language, and Jiddisch (ji) were both thought (by someone) to be less 
suitable than the English-based he and yi which replaced them.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread Michael Everson

At 01:21 -0400 2003-07-13, John Cowan wrote:

I hand-write  by making a tall lower-case epsilon glyph and then drawing
a solidus over it.
I just use the TIRONIAN SIGN ET.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread Michael Everson

At 14:09 -0400 2003-07-13, John Cowan wrote:
Michael Everson scripsit:

 I hand-write  by making a tall lower-case epsilon glyph and then drawing
 a solidus over it.
 I just use the TIRONIAN SIGN ET.
A good choice if you don't slash your DIGIT SEVENs and can make your
DIGIT ONEs sufficiently distinct.
Eh? I *do* slash my DIGITs SEVEN and I use a single vertical stroke 
from my DIGITs ONE. The TIRONIAN SIGN ET as used in Ireland has no 
horizontal stroke.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

No UTF-8 in Eudora

2003-07-13 Thread Michael Everson

Dear all,

Apparently, if you are a Eudora user and would to encourage Qualcomm 
to add proper UTF-8 support to Eudora, you can a request for this 
option to be included in a future version of Eudora to 
http://www.eudora.com/developers/feedback/ -- as Eudora 6 is in beta 
now, perhaps this is a good time to make your opinions known.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

2003-07-13 Thread Michael Everson

At 16:21 -0400 2003-07-13, John Cowan wrote:

I should have said do slash your DIGIT SEVENs.  So the glyph in the
Unicode 3.0 book is not typical of Irish practice?  It seems to have a
horizontal stroke all right.
It is utterly typical of Irish practice. I meant that it doesn't have 
an additional horizontal stroke as a slashed 7 does.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: [Private Use Area] Audio Description, Subtitle, Signing

2003-07-14 Thread Michael Everson

At 10:34 -0700 2003-07-14, Peter Kirk wrote:
On 14/07/2003 09:04, Doug Ewell wrote:

* Michael Everson's and Roozbeh Pournader's provisional PUA assignments
for ARABIC PASHTO ZWARAKAY and AFGHANI SIGN, two legitimate characters
that cannot be represented in Unicode by any other means.
Why not, may I ask, as a newcomer to this list? Is there some 
technical reason, or a political one?
What do you mean? The ZWARAKAY is a new combining mark; the AFGHANI 
SIGN is a unique currency symbol. Neither is yet encoded. In the 
report, Computer Locale Requirements for Afghanistan, it is 
recommended to use a PUA character until such time as the encoding 
process has run its course.

I would not recommend using COMBINING MACRON for the ZWARAKAY, and I 
don't know what could be recommended for the AFGHANI SIGN that is 
already encoded, apart from writing out the word.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Aramaic, Samaritan, Phoenician

2003-07-15 Thread Michael Everson

At 22:16 -0400 2003-07-14, John Cowan wrote:

Latn has more letters than Latg does, because it's had to add more;
I have made thorns and eths in Latg. ;-)

Latg is older than the current use of Latn, though not than Latn's
ancestor.
You're wrong. Latg is older than Latc (Carolingian) but it is not a 
separate script.

Some Latg characters are hard to identify if all you know is Latn. 
But we don't encode them separately.
Thorn and Wynn and Gha and Ou and Ezh and lots of other Latin letters 
are hard to identify if all you know is Latn. I think your use of 
Latn/Latg here isn't convincing.

  And the Samaritan Pentateuch is often printed in the Samaritan script.

A font difference would handle that.
Nh.

I'd like someone whose native script is Hebrew to comment on mutual
intelligibility, which was the main criterion for separating Glagolitic from
Cyrillic.
I don't think it was. Glagolitic and Cyrillic are obviously two 
different scripts. My native script isn't Hebrew but I am certain 
that no one who was could easily read a newspaper article written in 
Phoenician or Samaritan letters.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: Aramaic, Samaritan, Phoenician

2003-07-15 Thread Michael Everson

At 07:02 -0400 2003-07-15, David J. Perry wrote:
What is Latg vs Latn?
Latg is the Gaelic variant of the Latin script; Latf is the Fraktur 
variant of the Latin script; Latn is the generic Roman default.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Aramaic, Samaritan, Phoenician

2003-07-15 Thread Michael Everson

At 08:42 -0400 2003-07-15, Karljürgen Feuerherm wrote:
 Michael Everson said:
  My native script isn't Hebrew but I am certain that no one who was could
  easily read a newspaper article written in Phoenician or Samaritan letters.
Surely that is not an argument for encoding a separate script, is it?
It is sometimes. :-)

Most German people I know can't read the German 
cursive script used say 50 years ago. But the 
characters clearly correspond to the Latin 
characters in use today.
The handwriting is difficult to read. One would 
think that in German schools it would be at least 
introduced so children would know about it.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Aramaic, Samaritan, Phoenician

2003-07-15 Thread Michael Everson

At 09:22 -0400 2003-07-15, John Cowan wrote:
Michael Everson scripsit:

 Latg is older than the current use of Latn, though not than Latn's
 ancestor.
 You're wrong. Latg is older than Latc (Carolingian) but it is not a
 separate script.
VVELLIFYOVCOVNTANCIENTROMANSTYLEASORDINARYLATINSCRIPTTHENYES.
I do. C'mon, John, look at Trajan's Column. Yes, it's legible and the 
wax tablet texts are not, but they are contemporaneous and they are 
certainly the same script.

If I don't know Gha, and I see it, I know I don't recognize it: it's a
novel letter.  (And I may even think it says OI.)
(Michael weeps.)

If I see a Gaelic-style G and fail to recognize it *as* a G, that's 
quite different.
Normally one recognizes it in context. I fail to see your point, however.

And the Samaritan Pentateuch is often printed in the Samaritan script.
  
 A font difference would handle that.

 Nh.
Even now that German uses Antiqua almost exclusively, you might find a
Lutherbibel printed recently in Fraktur.
Even so, I don't think there's an advantage to unifying it with 
Hebrew; it is very different. See 
http://www.orindalodge.org/fonts/kadosh_samaritan_manual_1_10.pdf


  I don't think it was. Glagolitic and Cyrillic are obviously two
  different scripts.
From UTR #3:

# In the encoding, Glagolitic is treated as a separate script from
# Cyrillic, principally because the letter shapes are in most cases
# totally unrelated, with differences not at all arising from mere
# font style.
That's a draft by Rick McGowan. It indicates that they are obviously 
different scripts ;-) Anyway, look at Samaritan Yod and compare 
it with Hebrew Yod. Not mere font style.

And from p. 171 (section 7.3) of TUS 3.0:

# The Unicode standard regards Glagolitic as a *separate* script from
# Cyrillic, not as a font change from Cyrillic.  This position is taken
# primarily because Glagolitic appears unrecognizably different from
# Cyrillic, and secondarily because Glagolitic has not grown to match
# the expansion of Cyrillic.
A good update of Rick's original text.

What is this thread for? We're going to encode Phoenician. It is the 
forerunner of Greek and Etruscan. Hebrew went its separate way. The 
fact that there is a one-to-one correspondence isn't important. We 
have that for Coptic and Greek too and we are disunifying them. I'm 
pretty sure we're going to encode Samaritan too
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Aramaic, Samaritan, Phoenician

2003-07-15 Thread Michael Everson

At 12:05 -0400 2003-07-15, John Cowan wrote:
Michael Everson scripsit:

  We disunify Glagolitic, and rightly so too. But that does not mean
 that there are not intermediate cases that ought to be unified, and
 without definite criteria, it's hard to know what to do.
 Just grok them? :-)
Nope, won't work.
Works for me.

  When we get to encoding Samaritan, I guess the proposal will stand by
 itself or not.
Not if there are no criteria to judge it on that are better than See, it's
obvious!
Well, you are going to have to wait. I do not have time to write a 
proposal on Samaritan right now.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Aramaic, Samaritan, Phoenician

2003-07-15 Thread Michael Everson

At 07:53 -0700 2003-07-15, Peter Kirk wrote:

VVELLIHOPEVVEVVILL... ahem... Well, I hope we will count ancient 
Roman as Latin script rather than add to Unicode yet another new 
script which is almost identical to an existing one. But then it 
would make more sense than proposals to add new scripts or partial 
scripts for biblical Hebrew and for Aramaic, for at least ancient 
Roman inscriptions can be distinguished from nearly all modern texts 
by being in a different language.
Nope. The Aramaic ranged far beyond the middle east and itself -- not 
Hebrew -- was the forerunner of Syriac, Manichaean, Sogdian, 
Mandaean, Parthian, Avestan, Pahlavi, and other scripts.

But the existing Hebrew characters in Unicode are already in use for 
biblical Hebrew texts, as well as for what are probably the majority 
of surviving examples of ancient Aramaic which is not Syriac - the 
Aramaic portions of the Hebrew Bible, and presumably also the 
Aramaic parts of the Talmud and other ancient Jewish writings.
Aramaic is not only attested in Biblical texts. From Daniels  
Bright: Aramaic was the lingua franca of Southwest Asia from early 
in the first millennium BCE until the Arab Conquest in the mid 
seventh century CE.

Otherwise we end up with a new script for a few ancient inscriptions 
which are only slightly different in glyph shapes and repertoire and 
in language from an extensive corpus in an existing Unicode block.
We need to do further research on the subject, but it seems to me 
that Late Aramaic is still a candidate for encoding.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Aramaic, Samaritan, Phoenician

2003-07-15 Thread Michael Everson

At 09:39 -0700 2003-07-15, Peter Kirk wrote:

But then J was originally a glyph variant of I, and only quite 
recently in English have they been fully distinguished as letters.
It's not all that recent, and it wasn't English that made the innovation.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Aramaic, Samaritan, Phoenician

2003-07-15 Thread Michael Everson

At 20:17 +0100 2003-07-15, Thomas M. Widmann wrote:
John Cowan [EMAIL PROTECTED] writes:

 I'd like someone whose native script is Hebrew to comment on mutual
 intelligibility, which was the main criterion for separating
 Glagolitic from Cyrillic.
But if that criterion is applied, surely Georgian Xucuri/Khutsuri 
should be separated from Georgian Mxedruli/Mkhedruli: Although there 
roughly is a one-to-one correspondence between the two, and although 
both are generally applied to the same language (though normally to 
different stages of it), they definitely are not mutually 
intelligible (and in fact knowledge of Xucuri seems to be quite low 
in Georgia).
The UTC has agreed that we should do this. After 8 years or so of my 
whining ;-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Aramaic, Samaritan, Phoenician

2003-07-15 Thread Michael Everson

At 11:14 -0700 2003-07-15, Kenneth Whistler wrote:

The main reason for separately encoding Coptic, rather than
maintaining what we now recognize to be a mistaken unification
with the Greek script, is that it is less useful to people
who want to represent Coptic texts to have it be encoded
as a variant of Greek than it is to have it be encoded as a
distinct script.
Particularly as they regularly write text in both Coptic and Greek 
and this distinction is better expressed in plain text than in the 
font.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Aramaic, Samaritan, Phoenician

2003-07-15 Thread Michael Everson

At 17:34 -0400 2003-07-15, Patrick Andries wrote:

Sütterling ?
Sütterlin. Sütterling is the name of a panda in the Berlin zoo.

( Ludwig Sütterlin, 1865-1917)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Aramaic, Samaritan, Phoenician

2003-07-15 Thread Michael Everson

At 21:09 +0100 2003-07-15, Anto'nio Martins-Tuva'lkin wrote:
On 2003.07.15, 12:16, Michael Everson [EMAIL PROTECTED] wrote:

 Latg is the Gaelic variant of the Latin script;
Also known as _erse_, I was told.
That's incorrect. Erse is a Scots form of the word Irish. It's 
sometimes (but not politely today) applied to the language; the 
variant of the Latin script is usually called Gaelic script.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: [Private Use Area] Audio Description, Subtitle, Signing

2003-07-16 Thread Michael Everson

William.

If CENELEC wishes to standardize a set of icons, they will do so. If 
they have a need to interchange data using those icons, they will (if 
they are wise) come to us an ask to encode them. If they want to use 
the Private Use Area before they do that, they will.

Please don't tell us all about it over and over again, as you have 
done. If you want to talk to CENELEC, do so. Please stop trying to 
peddle your PUA schemes for CENELEC to us.

I maintain the ConScript Unicode Registry, which contains PUA 
assignments. I do not promulgate those on this list. (Apart from that 
fun testing of the Phaistos implementation some time ago.)

Roozbeh and I assigned two unencoded characters for Afghanistan to 
the PUA, and we encourage implementors to use them until such time as 
the characters are encoded.

We do not spend oceans of digital ink evangelizing our brilliant 
schemes to the Unicode list.

It is essentially a matter for end users of the system, just as the 
two Private Use Area characters being suggested in another thread of 
this forum in relation to Afghanistan are a matter for end users of 
the Unicode Standard and does not affect the content of the Unicode 
Standard itself.
Then go talk about it with the users of the system.

Code points for the symbols are needed now or in the near future.
Are they? By whom? And if they need to use the PUA, they can do so. 
It's Private.

It remains to be seen what will be decided as the built-in font for the
European Union implementation of the DVB-MHP specification.  It might be the
minimum font of the DVB-MHP specification or it might be more comprehensive.
For example, should Greek characters be included?  Should weather symbols be
included?  These and many other issues remain to be decided.
The minimum font for any specification for Europe should be the 
MES-2. If you are talking to these people, tell them.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: [Private Use Area] Audio Description, Subtitle, Signing

2003-07-17 Thread Michael Everson

At 17:01 +0100 2003-07-17, William Overington wrote:
Michael Everson raises some interesting points.

William.

If CENELEC wishes to standardize a set of icons, they will do so. If
they have a need to interchange data using those icons, they will (if
they are wise) come to us an ask to encode them. If they want to use
the Private Use Area before they do that, they will.
Perhaps I may explain the situation?
No, thank you. If CENELEC wants to propose characters to the Unicode 
Standard, they can contact us. I'd be interested in helping, if they 
had a good case. But I'm not looking for extra work right now.

Now, I have never heard of the MES-2 whatever that is.  However, I do not
have deep knowledge of the various standards which exist.  Could you
possibly say some more about MES-2 please.
A.4.2 282 MES-2

282 MES-2 is specified by the following ranges of code positions as 
indicated for each row.

Rows Positions (cells)

00 20-7E A0-FF

01 00-7F 8F 92 B7 DE-EF FA-FF

02 18-1B 1E-1F 59 7C 92 BB-BD C6-C7 C9 D8-DD EE

03 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D7 DA-E1

04 00-5F 90-C4 C7-C8 CB-CC D0-EB EE-F5 F8-F9

1E 02-03 0A-0B 1E-1F 40-41 56-57 60-61 6A-6B 80-85 9B F2-F3

1F 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D 80-B4 B6-C4 C6-D3 
D6-DB DD-EF F2-F4 F6-FE

20 13-15 17-1E 20-22 26 30 32-33 39-3A 3C 3E 44 4A 7F 82 A3-A4 A7 AC AF

21 05 16 22 26 5B-5E 90-95 A8

22 00 02-03 06 08-09 0F 11-12 19-1A 1E-1F 27-2B 48 59 60-61 64-65 82-83 95 97

23 02 10 20-21 29-2A

25 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0 AC B2 
BA BC C4 CA-CB D8-D9

26 3A-3C 40 42 60 63 65-66 6A-6B

FB 01-02

FF FD
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-18 Thread Michael Everson

At 00:57 +0200 2003-07-18, Philippe Verdy wrote:

Why is row 03 so resticted? Shouldn't it include those accents and 
diacritics that are used by other characters once canonically 
decomposed? Or does it imply that MES-2 is only supposed to use 
strings if NFC form?

Also, is this list under full closure with existing character properties, like
NFKD decompositions, and case mappings?
The MES-2 is what it is, and was developed at the time when it was. 
It is thought to be a minumum requirement for European requirements, 
and is certainly a lot better than that old Adobe glyph list that was 
supported earlier on. It doesn't depend on very smart fonts.

Personally I prefer the Multilingual European Subset.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-18 Thread Michael Everson

At 12:16 +0200 2003-07-18, Philippe Verdy wrote:

Is there some work at CEN to align its MES-2 subset into a revized 
(MES-2.1 ???) which not only takes into consideration the ISO10646 
reference but also its Unicode properties to make this set 
self-closed, and actually implementable, at least with NFC closure 
and case-mappings closure?
No. The relevant CEN committee is now dormant.

I still note that modern Hebrew and Arabic are excluded from MES-2, 
as they are not used in any official language in the European Union 
or EFTA, or future EU candidates. But They are certainly of great 
interest for countries with which the EU is a major partner, and 
which are using these scripts. In some future, it would be needed to 
include support for modern Georgian (a subset of U+10A0..U+10FF), 
and modern Armenian (a subset of U+0530..U+058F), as well as some 
characters from Cyrillic Supplementary (in U+0500..U+052F).
The European Multilingual Subset supports all of Latin, Greek, 
Cyrillic, and Armenian. Unicode supports Hebrew and Arabic.

On the opposite, I don't understand why MES-2 included characters
in row U+25xx (Box Drawing, Block Elements, Geometric Shapes)
Legacy compatability with IBM and others.

which are not strictly needed for text purpose (notably legal 
publications of the E.U., which should better use markup systems), 
and the two Alphabetic Presentation Forms U+FB01..U+FB02 (fi and 
fl ligatures) which are really unneeded, even for legal purposes, 
or they should have been coherent and included ff, ffi, ffl 
ligatures...
Legacy compatibility with Apple.

I suppose that this may come from widely used legacy encodings in 
some EU+EFTA+European Council countries, but CEN should have avoided 
them (they could still be selected by font renderers, if available 
in fonts).
You are entitled to your opinion. This work was begun and finished long ago.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-18 Thread Michael Everson

At 13:35 +0200 2003-07-18, Philippe Verdy wrote:

I note that you prefer the European Multilingual Subset to MES-2. 
Is it an extended set that includes MES-2, and fills the holes by 
using all characters defined in blocks of some version of the 
Unicode set?
It is script-based, not character based. It includes all Latin, 
Greek, Cyrillic, Georgian, and Armenian characters. And is a superset 
of MES-2.

I *prefer* Unicode to any subset thereof.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

I am not in India

2003-07-18 Thread Michael Everson

Colleagues,

Apparently some of you have got copies of mail I wrote in December 
2002 entitled Coptic II? which has some virus attachment to it. 
This has been sent by [EMAIL PROTECTED] which is not me, and I 
didn't send it, and I use Mac OS X and Eudora so I don't have a virus.

Thanks.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Thank you. (was Re: [Private Use Area] Audio Description,Subtitle, Signing)

2003-07-18 Thread Michael Everson

At 12:11 +0100 2003-07-18, William Overington wrote:
Thank you for the list of code points for MES-2.

I have already found that the DVB-MHP minimum set does not have some of them
and that the DVB-MHP minimum set does have some which are not in MES-2, such
as U+1EB0 to U+1EB5.
If this is of interest to CENELEC, feel free to tell them.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

I am not in India II

2003-07-18 Thread Michael Everson

Your message has encountered delivery problems
to the following recipient(s):
[EMAIL PROTECTED]
Delivery failed
554 delivery error: dd This user doesn't have a yahoo.co.in account 
([EMAIL PROTECTED]) [-5] - mta104.mail.in.yahoo.com
See?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: About the European MES-2 subset

2003-07-18 Thread Michael Everson

At 11:28 -0400 2003-07-18, John Cowan wrote:

However, a font like Last Resort (the world's smallest giant font, as it were)
does that just about as well.
While I hate seeing the Last Resort font show up, I love seeing it 
when it does. :-) S much better than ?.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-18 Thread Michael Everson

At 13:07 +0200 2003-07-18, Kent Karlsson wrote:

This is not to say that the MESes are unproblematic.  To mention just
two points not already mentioned: none of the new math characters
are included even in MES-3 (a, b), despite that all math characters
were supposed to be included
That isn't true.

and not even MES-3 covers all official minority languages.
What's missing?

(But as Philippe states, there are some rather useless characters 
that have been included for compatibility reasons.)
Same goes for Unicode though. :-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: I am not in India II

2003-07-18 Thread Michael Everson

At 00:44 +0200 2003-07-19, Adam Twardoch wrote:
From: Michael Everson [EMAIL PROTECTED]
 [EMAIL PROTECTED]
This merely means that somebody has a virus who had both Michael and Roozbeh
in his/her address book.
People who believe that e-mails with a particular name in the From field
must come from that very person can be called, ehem, naiive.
That's an interesting way of writing the diaeresis on naïve, Adam. :-)

This particular virus sends itself around, identifying the sender as one of
various addresses from the infected person's address book. In addition, the
virus swaps the usernames and domains around, so addresses such as
[EMAIL PROTECTED] are created.
So, basically, it means that the virus probably comes from a person who:
1. Is in Singapore.
2. Has following entries in the address book: [EMAIL PROTECTED], [EMAIL PROTECTED],
[EMAIL PROTECTED] and [EMAIL PROTECTED]
3. Uses Microsoft Windows.
Anybody ring a bell?
James Seng?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: About the European MES-2 subset

2003-07-18 Thread Michael Everson

At 15:45 -0700 2003-07-18, Michael \(michka\) Kaplan wrote:
A question mark is a sign of a bad conversion from Unicode (to a code page
that did not contain the character). This would likely happen on the Mac too
rather than the Last Resort font, wouldn't it?
No, it wouldn't. A not a character glyph is displayed in the Last 
Resort font.

On Windows, the cannot find a font for it situation is the NULL glyph.
Not much netter than ?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: I am not in India

2003-07-19 Thread Michael Everson

At 15:25 +0430 2003-07-19, Roozbeh Pournader wrote:
On Sat, 2003-07-19 at 02:46, Doug Ewell wrote:
 I got something titled  Re: Coptic II? (note leading space) from
 [EMAIL PROTECTED], which I am pretty sure is not Roozbeh
 Pournader.
I definitely now *nothing* about Coptic but that's it's related to Greek
to some degree.
The Coptic script derives from the Greek script, but the language is 
Late Egyptian.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-19 Thread Michael Everson

At 15:23 +0200 2003-07-19, Philippe Verdy wrote:
Unicode does not define the charset (which are defined by ISO10646),
That isn't true. They both define the same character set. (I will not 
use the term charset.)

but character properties and related algorithms, and (in cooperation 
with ISO10646) their codepoint assignments.
The code position assignments are (formally) assigned by WG2, but 
there is consensus between UTC and WG2 on this matter.

For me, Unicode is NOT a character set, but an encoded character 
set, with a small but important nuance: You need to specify a 
version after Unicode to indicate the character set. So Unicode 4.0 
is a character set, and a superset of Unicode 3.2, but Unicode alone 
is not.
To me, Unicode refers to the most recent version. :-)

If you just look at this definition, you cannot prefer Unicode to 
any subset,
Yes, I can.

because Unicode is just a name of a collection of standards and a 
collection of character sets and algorithms
That isn't true. If you think this is true, you really have a lot to 
learn about Unicode.

and already is a subset of the next version... If you cannot support 
the idea of subsets, then don't use Unicode, or wait that the 
Unicode standard is definitely closed, or permanently consider that 
is repertoire is now closed and no more characters will be added... 
Of course you would be wrong.
I think you mistook me.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: About the European MES-2 subset

2003-07-19 Thread Michael Everson

At 16:41 -0700 2003-07-18, Michael \(michka\) Kaplan wrote:
I am pretty sure you have to be wrong here, Michael. Attend me:

1) API converts from Unicode to the wrong code page
2) API does some sort of work with the string
3) API tries to display the string
How on earth could it from the Last Resort font, unless it is a generic
glyph that contains no script info (which would be no better than a question
mark or a NULL glyph) ?
Hm. See http://developer.apple.com/fonts/LastResortFont/ where it 
shows glyphs for illegal characters (FFFE/ etc.) as well as 
undefined characters (valid code positions which have not been 
assigned). I thought somehow that there was a glyph for broken 
characters (characters that were just plain wrong) as well.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-19 Thread Michael Everson

At 20:24 +0200 2003-07-19, Philippe Verdy wrote:

Isn't this page creating the idea for a specific block of 
script-representative glyphs, that could be mapped in plane 14 as 
special supplementary characters ?
Good heavens, no.

It's one thing for me to update this font regularly for Apple when 
new blocks get added to the standard.

It's quite another thing to suggest that we should have to add, 
formally, a new block symbol to some block in Plane 14 every time we 
add a new block to the standard.

Isn't it?

Surely the correct thing to do is to implement Last Resort support 
for different platforms as Apple indicates using those character 
names.

So fonts containing these glyphs could be designed to display these 
glyphs, in a way similar to the current assignment of control 
pictures.
Um, that's what the Last Resort font does, outside of Unicode 
encoding space. (I don't think PUA characters are used, actually, but 
I could be wrong.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Karen Language Representation in Unicode

2003-07-20 Thread Michael Everson

I've discussed the matter with Christian and you can write to me about it.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread Michael Everson

At 23:34 +0200 2003-07-19, Philippe Verdy wrote:

I'm still convinced that these glyphs are much more informative than 
a default glyph showing a ?, a white rectangle, or a black losange 
with a mirrored white ?...
Of course they are.

And Unicode also uses these glyphs in the index page for its charmaps,
You mean for its charts. Please.

but they are shown as poor bitmaps (may be the PDF or book version 
use your glyphs in a document-embedded font)
That page is in HTML.

How were your glyphs contributed?
I, uh, drew them.

With SVG graphics containing character objects and drawing primitives
I have no idea what this means. I used Fontographer.

(it seems the simplest way to derive them, using the table shown in 
Apple's web page, with some exceptions for unassigned, reserved, 
forbidden or
surrogates symbols which require a distinct design)?
You can't derive these. You have to draw them individually.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread Michael Everson

At 08:20 -0500 2003-07-20, [EMAIL PROTECTED] wrote:

What would be the purpose of encoding these? I can't think of any. 
They  certainly don't need to be encoded as distinct characters to 
use in a Last  Resort font.
I am certain more people want to interchange the LITTER DUDE than 
would want to interchange script block indicators.

(Ken suggested offline that this name might be better-received than 
the DO NOT LITTER SIGN)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread Michael Everson

At 09:56 -0600 2003-07-20, John H. Jenkins wrote:

No, it uses the acutal Unicode characters, and just has a huge cmap 
that maps everything in Unicode to the glyph for its block.
That is just so cool. :-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

[OT] French Government Bans the Term 'E-Mail'

2003-07-20 Thread Michael Everson

Off-topic, but interesting. This just crossed my desk 
http://news.yahoo.com/news?tmpl=story2cid=518u=/ap/20030718/ap_on_re_eu/france_out_with__e_mail__3printer=1
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: About the European MES-2 subset

2003-07-20 Thread Michael Everson

At 12:38 -0700 2003-07-20, Peter Kirk wrote:

Indeed. Where can I get the Last Resort font for Windows (2000)? If 
the answer is nowhere, I guess I am stuck with Arial Unicode MS or 
the horrible-looking (the J always grates!) Code2000.
I'll go have a chat with some of my Apple colleagues about this.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: About the European MES-2 subset

2003-07-20 Thread Michael Everson

At 20:50 + 2003-07-20, [EMAIL PROTECTED] wrote:
  At 12:38 -0700 2003-07-20, Peter Kirk wrote:
 Indeed. Where can I get the Last Resort font for Windows (2000)? If
 the answer is nowhere, I guess I am stuck with Arial Unicode MS or
 the horrible-looking (the J always grates!) Code2000.
 I'll go have a chat with some of my Apple colleagues about this.
It's unlikely that your Apple colleagues can do anything for
the J in Code2000.
I wasn't talking about that, but if you'd like my opinion, I hate that J too.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: [OT] French Government Bans the Term 'E-Mail'

2003-07-21 Thread Michael Everson

At 10:59 -0400 2003-07-21, Patrick Andries wrote:
- Message d'origine -
De: Michael Everson [EMAIL PROTECTED]

 At 19:56 -0400 2003-07-20, Patrick Andries wrote:

 Obviously, the AP has found someone to say it is artificial.

 Of course, all language is artificial.
Well, at least all new words that can be traced to someone can be so «
described ».
*All* words must be traced to someone. They do not grow on trees.

I also wonder if anybody in the US said to the 
inventor of email or any new word : this is 
artificial. It seems somewhat nonsensical or at 
least tautological for any newly coined word.
eBook, e-mail, eBay, e-money, and all that gunk. 
I suppose we could do without them. Even Apple's 
gone weird about it. I don't know what the i in 
the iLifestyle suite (iChat, iPhoto, iBook, 
iThis, iThat) means.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: [OT] French Government Bans the Term 'E-Mail'

2003-07-22 Thread Michael Everson

At 11:41 +0100 2003-07-22, Marion Gunn wrote:

I read that 'i' (in the Apple context) as 
meaning 'i(nternet ready)'. It is possible I 
could be wrong about that. Am I?
Yes, you are.
--
ME

Re: Useful identifier for Scripts

2003-07-24 Thread Michael Everson

At 15:00 -0400 2003-07-24, John Cowan wrote:
Markus Scherer scripsit:

 Note that even for single-language text you may need multiple script
 identifiers. For example, for Japanese text you will need 3 identifiers for
 Han+Hiragana+Katakana. Obviously, if you have multilingual text, you will
 need more.
Politely, ISO 15924 supplies a special code for this case.
You're welcome.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Damn'd fools

2003-07-25 Thread Michael Everson

At 15:46 -0400 2003-07-25, John Cowan wrote:

  When the United Kingdom hands back Northern Ireland to Ireland
 in 2052, then obviously the numeric codes of both countries will have to
 change, but not the codes for the names.
Presumably the name of the U.K. would change, however.
Why? It would be the United Kingdom of Great Britain, which comprises 
England, Scotland, Wales, and the Duchy of Cornwall.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Damn'd fools

2003-07-28 Thread Michael Everson

At 04:58 -0700 2003-07-28, Peter Kirk wrote:
On 28/07/2003 04:31, Michael Everson wrote:

The Normans of course were frankified Norsemen.

(My word. Apparently francized would be used in 
Québec; frencify occurs but is apparently 
often derog..)
Thanks, Michael. Of course I could have 
suggested to Jarkko to ask an English speaking 
Irish person is he or she is English.
Perhaps we are Hiberno-Saxons

(Ducks.)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: OT: Damn'd fools

2003-07-28 Thread Michael Everson

At 11:47 -0700 2003-07-28, Peter Kirk wrote:

So if Finland was part of Russia, Canada is part 
of England. How do you like that one, 
Karljürgen? Should I expect an imminent French 
(Canadian) invasion?
I thought Québec wanted to join the EU

(Ducks again.)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-28 Thread Michael Everson

At 13:22 -0700 2003-07-28, Kenneth Whistler wrote:

Because changing the canonical ordering classes (in ways not
allowed by the stability policies) breaks the normalization
*algorithm* and the expected test results it is tested against.
Do you really think that algorithm with all its warts is going to be 
used 50 years from now? I really would like to know.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-28 Thread Michael Everson

At 15:47 -0700 2003-07-28, Peter Kirk wrote:

Well, except two countries, or more than two if you have been 
following the damn'd fools thread. We British resisted Napoleon 
and we continue to resist his innovations like the metric system, 
though we are being forced to make a gradual change.
Thank heavens. :-) Unless you miss non-decimal currency.

There are still many things which Napoleon managed to impose and are 
still uniform all the way from Calais to Vladivostok (because even 
the Russians accepted his system for a while), even traffic rules 
(drive on the right, give way to the right), but are different in 
the UK.
That doesn't mean it's a good idea that these things aren't standardized.

Though I like the fused UK and Irish electric socket plugs, which are 
extremely safe
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Back to Hebrew, was OT:darn'd fools

2003-07-29 Thread Michael Everson

At 07:31 -0700 2003-07-29, Peter Kirk wrote:

I don't think you French Canadians would be very happy if accented 
upper case vowels were removed from Unicode because they are not 
used in France.
This isn't true. They *are* used in France.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: French accents on uppercase, was Back to Hebrew, wasOT:darn'd fools

2003-07-29 Thread Michael Everson

At 11:47 -0400 2003-07-29, Karljürgen Feuerherm wrote:
I believe they're optional though, at least, aren't they?
Not in good typography. You must unlearn what you have learned
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Back to Hebrew, was OT:darn'd fools

2003-07-29 Thread Michael Everson

At 11:52 -0400 2003-07-29, Jim Allan wrote:

One the other hand, dropping diacritics from 
names or text written in all uppercase is 
considered acceptable in Quebec French (and I 
suspect also in France) dating from old 
addressograph technology and billing typewriter 
technology where capital letters alone were 
available and diacritics were not normally 
included as part of the character set.
Then you have the old problem: what does « LE 
PRESIDENT ASSASSINE » mean if such a practice is 
employed?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Back to Hebrew, was OT:darn'd fools

2003-07-29 Thread Michael Everson

At 08:47 -0700 2003-07-29, Peter Kirk wrote:

Another example might be German ß (U+00DF). Many 
people don't use it, indeed I think it has been 
officially abolished, but many others do use it.
Peter, there isn't a shred of truth in what you are saying.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Back to Hebrew, was OT:darn'd fools

2003-07-29 Thread Michael Everson

At 10:36 -0700 2003-07-29, Peter Kirk wrote:

The only shred of untruth is that what I said I think is true is in 
fact an exaggeration, the abolition is only partial.
Hence it was not officially abolished.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: Back to Hebrew - Vav Holam

2003-07-29 Thread Michael Everson

At 22:21 +0200 2003-07-29, Jony Rosenne wrote:

With Hebrew, it is not accepted that it is a different Vav - letters 
used as matres lectionis are not distinct from the same letters used 
otherwise. Neither is it accepted that this is a different Holam. 
The only thing established is that this artifact has been used in 
several manuscripts, one of many similar artifacts, to aid the 
understanding of the text. And the correct vehicle to convey such 
artifacts is markup.
Ink dots used to aid the understanding of the text are always encoded 
as characters. Markup is the wrong way to handle them. Otherwise we 
would write Karljfrontedu/frontedrgen or the like.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: Back to Hebrew - Vav Holam

2003-07-29 Thread Michael Everson

At 15:41 -0500 2003-07-29, [EMAIL PROTECTED] wrote:
Jony Rosenne wrote on 07/29/2003 03:21:08 PM:

 The only thing established is that this artifact has been used in
 several manuscripts, one of many similar artifacts, to aid the
 understanding of the text. And the correct vehicle to convey such
 artifacts is markup.
You say this as if it's objective truth. Now, if I see Latin-script text
with a diacritic comma above in some places but also a comma above and a
little to the right, the correct vehicle to convey these artifacts is the
pair of distinct characters, U+0313 COMBINING COMMA ABOVE and U+0315
COMBINING COMMA ABOVE RIGHT. Apparently, in the case of Latin, it was not
considered an objective truth that the correct vehicle is markup.
If it comes to having an above-Hebrew-thingy and a 
next-to-Hebrew-thingy or having it be done by markup, I certainly 
would prefer the character-based solution.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Hebrew Vav Holam

2003-07-30 Thread Michael Everson

At 21:29 +0200 2003-07-30, Jony Rosenne wrote:
Problem:

We have here one character sequence with two alternate renditions: the
common rendition, in which they are the same, and a distinguished rendition
which uses two separate glyphs for the separate meanings.
On paper, which is two-dimensional, it is a Vav with a Holam point somewhere
above it. Unicode decided that in the encoding, which is one-dimensional,
the marks follow the base character.
Any solution should accommodate both kinds of users and both renditions.

Solution: Suggestions, please.
Please put this in a document with an actual illustration of the 
problem. I don't follow it from the verbal description.

In Tengwar, tinco with a three-dot diacritic over it can be read [ta] 
or [at] depending on the language.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Hebrew Vav Holam

2003-07-30 Thread Michael Everson

At 16:50 -0400 2003-07-30, John Cowan wrote:
Michael Everson scripsit:

 See the reference glyph for U+FB4B. One form looks like this with
 the dot above further to the left, the other like it with the dot a
 little further to the right. This glyph with the centred dot is a
 compromise between the two.
 A picture speaks a thousand words.
These particular words combined with the picture in the U3.0 chart tell all.
I see. This disunification tempts. I'd go to the bother of writing up 
the proposal for adding this combining character if on further 
discussion it appears the right thing to do.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Hebrew Vav Holam

2003-07-31 Thread Michael Everson

At 13:12 -0400 2003-07-31, Ted Hopp wrote:

For reasons I posted earlier, I don't think encoding the dot is the right
approach.
I despair of following this thread.

I'd propose something that would look like this in the UCD (with 'nn' to be
determined, but it should be in the Hebrew block):
05nn;HEBREW VOWEL HOLAM MALE;Lo;0;R;compat 05D5 05B9N;
We do not encode any HEBREW VOWELs. We encode LETTERs and combining marks.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: Hebrew Vav Holam

2003-07-31 Thread Michael Everson

At 21:57 +0200 2003-07-31, Jony Rosenne wrote:
I was under the impression that old English manuscripts did use different
glyphs for the two sounds of th.
Thorn and eth.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Hebrew Vav Holam

2003-07-31 Thread Michael Everson

At 16:18 -0400 2003-07-31, Ted Hopp wrote:
On Thursday, July 31, 2003 3:03 PM, Michael Everson wrote:
 We do not encode any HEBREW VOWELs. We encode LETTERs and combining marks.
I agree with the do not if it's descriptive of current practice. If it's
prescriptive, I'd have to ask why. (And please don't say stability policy!
:))
The Name Police like consistency.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Conflicting principles

2003-08-06 Thread Michael Everson

At 16:16 -0400 2003-08-06, John Cowan wrote:
I would like to ask the old farts^W^Wrespected elders of the UTC
which principle they consider more important, abstractly speaking:
the principle that combining marks always follow their base characters
(a typographical principle), or that text is stored, with a few minor
exceptions, in phonetic order (a lexicographical principle).
Are you thinking of the Tengwar?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Conflicting principles

2003-08-07 Thread Michael Everson

At 15:18 -0700 2003-08-06, Kenneth Whistler wrote:

  As someone or other said, I believe that hitherto -- *hitherto,* mark
 you -- [we have] entirely overlooked the existence of, well, scripts
 that might cause a conflict between these esteemed principles.
The reason why the UTC should tackle the encoding of Tengwar is not 
so much because it would help in the publication of Elvish poetry, 
but because confronting the architectural issues it poses for 
encoding would make an excellent tutorial case for how the two 
principles of combining mark order and
logical order impact the task of coming up with an appropriate 
encoding for a complex script. And it would starkly illustrate the 
fact that an appropriate character encoding does not necessarily 
directly reflect the phonological structure of a language as 
represented by that script.
Some rather old discussion papers on this topic may be found at 
http://www.evertype.com/standards/iso10646/pdf/tengwar-vowels.pdf and 
http://www.evertype.com/standards/iso10646/pdf/tengwar.pdf

It *is* a problem.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: Conflicting principles

2003-08-08 Thread Michael Everson

At 23:07 +0200 2003-08-07, Kent Karlsson wrote:
  Kent Karlsson scripsit:
  4) Encode the vowel signs as combining characters, after
  the base characters they logical follow. Consider them as
  double [width] combining characters, that happen to
  have no ink above/below the character they apply to,
  but (like double width combining characters) have ink
   over/under the glyph for the base character that follows.
Kent. Read my papers. A similar approach is proposed.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Questions on ZWNBS - for line initial holam plus alef

2003-08-08 Thread Michael Everson

At 01:11 +0200 2003-08-09, Philippe Verdy wrote:

I just picked SYMBOL to just match the required property that would match
other spacing variants of diacritics. The ZERO WIDTH is probably 
confusive, but it just marks the fact that it has no associated 
glyph and a null *minimum* width (which expands to the largest 
diacritic(s) with which it is combined).
The Name Police reject this utterly. ZERO WIDTH cannot have an 
expanding dynamic width.
This pseudo-character will not be encoded. Time to drop the thread.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Newbie Question - what are all those duplicated charactersFOR?

2003-08-09 Thread Michael Everson

At 17:46 +0100 2003-08-08, [EMAIL PROTECTED] wrote:
I'm reasonably sure that this question reflects my own ignorance, rather
than some problem with the standard, but nonetheless, I am confused.
Read the text. Don't just read the code charts.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Aramaic scripts

2003-08-10 Thread Michael Everson

At 09:00 +0100 2003-08-09, Raymond Mercier wrote:
There are omissions in Michael Everson's chart in

http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf

The chart was based on Semitic languages, although purporting to be 
about scripts.
No, it wasn't.

There are less obvious omissions:

1. Kharoshthi, a RtoL script much used in North West India, and 
regarded by everyone as a derivative from a form of the Aramaic 
script used in that region. It is found on coins, Ashokan edicts, 
various inscriptions and manuscripts. It was used to write mainly 
prakrits, although some sanskrit text is known. See, for example, 
A.H. Dani, Indian Palaeography, Oxford 1963.
We are well aware of Kharoshti, which was roadmapped without any difficulty.

2. Pahlavi, widely used to write Middle Persian. This involved a 
troublesome mixture of Persian reading of Aramaic words, a subject 
requiring more elaboration than is needed here.
We are well aware of Avestan and Pahlavi, which were roadmapped 
without any difficulty.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Questions on ZWNBS - for line initial holam plus alef

2003-08-12 Thread Michael Everson

At 10:58 -0700 2003-08-11, Peter Kirk wrote:
On 11/08/2003 06:59, Jon Hanna wrote:

There are only two theoretical problems that I can see here, the first is
that a whitespace character other than space gets converted to space by
attribute value normalisation, and that this changes the meaning of the text
in some way. This could only occur if the combining character were the first
character in a line of text, which is quite a nonsensical construct to begin
with.
Not at all! Imagine a tutorial on a language, which might well list 
the accents used, in a format like this:

` (grave accent) is used with a, e and o, and indicates more open 
pronunciation
^ (circumflex accent) is used with any vowel, and indicates lengthening

So far so good, but when I get to an accent with no predefined 
spacing variant, I have a problem!
It has been explained the mechanism for doing this, and it has been 
explained that if it is not implemented correctly you should yell at 
the implementors.

In Mac OS X, for instance, the horizontal spacing seems to work all 
right for many accents, but they seem to prefer to rest just above 
the baseline. I'll report this as a rendering bug.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Colourful scripts and Aramaic

2003-08-12 Thread Michael Everson

At 18:03 -0400 2003-08-07, Karljürgen Feuerherm wrote:
My knowledge of Aramaic script is a little scanty, but my understanding is
more or less the same as Peter's. Which leads me 
to suggest that encoding Aramaic separately 
would be a bit like encoding Old Akkadian 
(Cuneiform) separately from NeoAssyrian 
(Cuneiform). Which would be a bit silly (and not 
what we are planning in that arena) Note 
that some people are even willing to argue that 
the substrate languages might be considered 
distinct, too--in case that is the argument 
which would be applied to Aramaic.
We do not encode languages. Would somebody please 
read 
http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf 
before deciding what it is that is meant by 
Aramaic in the Roadmap? Note that Hebrew descends 
FROM it, and that as do number of other scripts 
which clearly do NOT descend from Hebrew.

Unicode encodes Square Hebrew.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Handwritten EURO sign

2003-08-14 Thread Michael Everson

At 23:35 +0200 2003-08-05, Pim Blokland wrote:

I have absolutely no idea what you are talking about.
You are lucky not having to put up with bad English like five euro 
and six cent, living in the Netherlands and speaking Dutch as you 
do. See http://www.evertype.com/standards/euro if you wish to learn 
more about a disaster in language planning.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Colourful scripts and Aramaic

2003-08-14 Thread Michael Everson

At 13:12 -0700 2003-08-07, Peter Kirk wrote:

Well, it seems to me that in the case of the Aramaic proposal we 
don't even have that. We have an archaic version of the script which 
is now used mainly for Hebrew, and which many scholars still call 
Aramaic (in distinction from paleo-Hebrew) although Unicode calls it 
Hebrew. The Aramaic glyphs are almost all recognisably the same as 
or slight variants on the Hebrew ones. And Hebrew script is already 
used, uncontroversially, for large corpora of Aramaic e.g. in the 
Talmud. Why a new script for the few surviving examples of ancient 
Aramaic in this script?
People. It's the widespread offshoot used throughout the Middle East 
that spawned Brahmic and Uighur and other scripts. It isn't 
necessarily the thing you think is confined to three scraps of 
papyrus or whatever. We aren't working actively on this now. We don't 
have an active proposal. We have something roadmapped, and I for one 
don't want to spend time right now defending its roadmapping to you 
apart from what is in my earlier paper on Semitic scripts. Could you 
turn off the fire alarms?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

[hebrew] Re: Roadmap---Mandaic, Early Aramaic, Samaritan

2003-08-14 Thread Michael Everson

Elaine,

I really, really, really don't have time to debug your 
dissatisfaction with the use of the word Aramaic in the Roadmaps. 
This is NOT something anyone is working actively on right now. When a 
proposal comes forth, there will be evidence in it that can be picked 
at.

In actuality, one could make a very good case that all extant Semitic/
extended Aramaic-Moabite-Amorite-Yaudic-Hebrew etc. type alphabetic scripts
between the earliestSinaitic / Wadi El-Hol---and middle Parthian
are font variants
We are not going to encode Phoenican and Samaritan and Palymrene as 
font variants of Hebrew. If you want to write those languages in 
Hebrew script, do so.

Any border(s) you draw will be either completely artificial or mostly
artifical.  That's the problem.
The borders we draw are based on the analyses of script experts.

I gather that you are a font person, fascinated by the aesthetic 
pleasure of wondrous shapes.
I am a lot more than that.

I am a database person, concerned with minimizing unnecessary font 
variation, which may interfere with future overworked Semitic 
retrieval engines.
You will never be at as greater disadvantage than a Sanskritist is, 
considering that the Rg Veda can be written in a dozen or so scripts.

 The Mandaic and Samaritan scripts apparently
 enjoy at least some modern liturgical use.
Yes, they do!   But the Samaritan is also heavily used within
Jewish studies  /  Biblical studies communities.  The Samaritans
also use their shapes in private correspondence.
Then we shall encode them.

  of Aramaic script to encode has not been looked at carefully. Indeed
 we have no current proposals which are well-advanced at this time.
I'm responding now because this may be the only time period where
Hebraists interact with UnicodeCarpe diem..
Hebraists are discussing concerns about METEG and things. You're 
responding about things which don't even have formal proposals to 
respond to. If you want me to start working on encoding other early 
Semitic scripts, please give generously to the Script Encoding 
Initiative and ask for prioritization. Failing that, I will be 
working on things which have higher priority (and more complete 
proposals) at present, like Coptic, Saurashtra, Nuskhuri, Buginese, 
N'Ko, Ol Chiki, Avestan and Pahlavi, and so on.

  I am responding at great length to the Roadmap proposals
 for the Semitic dialects Mandaic, Early Aramaic, and
 Samaritan. 

 We are proposing to encode scripts, not languages.
Yes, that is your take on it.  But scripts are frozen language,
not the liquid language of speech or the gaseous language of
poetry..  You encode scripts so we can manipulate languages
We encode scripts so that we can represent texts. And we will do it, 
as we have, to the best of our ability, but not by lumping everything 
together just because it makes things easy for database programmers.

Best regards,
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Conflicting principles

2003-08-14 Thread Michael Everson

At 01:18 +0200 2003-08-09, Philippe Verdy wrote:

Such break in a middle of a multiple width diacritic exist in some 
notations, and are not considered horrible typography. Just look 
at musical notations where a upper horizontal parenthesis
is used to group some elements [...]
Music setting is not typesetting, and that kind of music 
representation is outside of the scope of the Unicode Standard.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Display of Isolated Nonspacing Marks (was Re: Questions onZWNBS...)

2003-08-14 Thread Michael Everson

At 01:30 +0200 2003-08-10, Philippe Verdy wrote:
Whateer you think, the SPACE+diacritic is still a hack, and 
certainly not a canonical equivalent (including for its properties), 
of the existing spacing diacritics, which also do not fit all usages 
because they are symbols.
It is the formally specified way to represent what you say you want 
to represent. If an implementation doesn't do that nicely enough, 
complain to the implementors. (This has already been suggested to 
you.)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 2172 matches

Mail list logo