Re: ConScript registry?

2001-02-01 Thread David Starner

On Wed, Jan 31, 2001 at 05:06:14AM -0800, Michael Everson wrote:
 Of those in the registry, I would guess only 8 (Tengwar, Cirth,
 Engsvanyali, Shavian, Solresol, Visible Speech, Aiha, and Klingon) have any
 claim to be added to Unicode. 78 columns, less than 624 characters to be
 added.
 
 These would appear to be in use by actual communities of some size. (Some
 of the other ConScripts appear to be in use only by their creators.)

The only reason I include Aiha in my list was because it's already got 
a block tenatively assigned on your roadmap to the SMP. I've never heard
of this language before, and wouldn't have included it otherwise. Why was
it included in the roadmap?

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org



Re: ConScript registry?

2001-01-31 Thread Michael Everson

Ar 13:56 -0800 2001-01-30, scrobh John Jenkins:

 Of those in the registry, I would guess only 8 (Tengwar, Cirth,
 Engsvanyali, Shavian, Solresol, Visible Speech, Aiha, and Klingon) have
 any claim to be added to Unicode. 78 columns, less than 624 characters to be
 added.

Don't forget Deseret, which will, in fact, be part of Unicode 3.1.

Version 2.1 of ConScript removes Deseret and points the user to the SMP.
(John Cowan hasn't updated the mirror site yet.) This is an object lesson
in the volatility of the Private Use Area. I suppose I ought to do up a
mapping table for anyone who used the old Deseret.

Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein ochtarach; Baile tha Cliath 2; ire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Pirc an Fhithlinn;  Baile an Bhthair;  Co. tha Cliath; ire





Re: ConScript registry?

2001-01-31 Thread Michael Everson

Ar 14:54 -0800 2001-01-30, scrobh David Starner:

On a calmer note, how many script submissions does Unicode and the
ISO 10646 working group get now? How about from people outside Unicode
and the working group? What about outside the standards bodies?

The occasional Southeast Asian script we hadn't seen before is brought to
our attention from outside, but in general we've identified a large set of
scripts we need to work on (see the Roadmap) and we sort of focus on that.
It is sometimes difficult to work on them because of resources available,
and for some of the scripts it is difficult to get in touch with users or
experts.

If my guess is right, there's very few submissions from outside Unicode,
and really no evidence that this would pick up significantly after Tengwar
or Klingon got encoded.

I would tend to concur with this assessment.

Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein ochtarach; Baile tha Cliath 2; ire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Pirc an Fhithlinn;  Baile an Bhthair;  Co. tha Cliath; ire





Re: ConScript registry?

2001-01-31 Thread Michael Everson

Ar 12:19 -0800 2001-01-30, scrobh David Starner:

The ConScript registry (http://www.egt.ie/standards/csur/index.html) is a
place where constructed/artifical scripts can be registered in a way
that they can be publicially transfered (among those who recognize the
encoding, of course.)

"By agreement between sender and receiver" is the usual jargon.

It also is a 'proof' that there won't be a huge surge of constructed
characters in Unicode if you let Klingon or Tengwar in.

Is it?

There is roughly
2000 characters encoded in the BMP Private-Use area, with another 6,000
in Plane 16. Even accepting them all, that would fit easily in the space
that hasn't even been tenatively allocated in Plane 1's roadmap.

Oh, I see what you're saying. ConScript handles some 40 scripts -- even if
they were **all** accepted into the SMP it wouldn't make that much
difference. Not that we're thinking of that.

Cowan and Everson have not been very picky about which scripts they
included in the ConScript registry.

Well, we tried to make sure the proposals were of quality. We preferred it
a lot if fonts were available for the user.

Of those in the registry, I would guess only 8 (Tengwar, Cirth,
Engsvanyali, Shavian, Solresol, Visible Speech, Aiha, and Klingon) have any
claim to be added to Unicode. 78 columns, less than 624 characters to be
added.

These would appear to be in use by actual communities of some size. (Some
of the other ConScripts appear to be in use only by their creators.)

Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein ochtarach; Baile tha Cliath 2; ire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Pirc an Fhithlinn;  Baile an Bhthair;  Co. tha Cliath; ire





Re: ConScript registry?

2001-01-31 Thread Michael Everson

Ar 13:23 -0800 2001-01-30, scrobh Thomas Chan:

I don't think that CSUR is conclusive proof that there wouldn't be a
deluge of demands for encoding fictional or constructed scripts if the
likes of Tengwar or Klingon were encoded.

Well, I think what David was saying is that there don't seem to be all that
many of them.

CSUR is just a pair of websites
without nowhere the high profile nor authority of Unicode.

I thought one of the Unicode web pages linked it. I could be wrong. And the
CSUR states explicitly that it is just for fun. Having said that, I do know
of some folks who have done implementations of one sort or another based on
its specifications.

If say, a
fictional script were included and published by Unicode and ISO, then
people all over would suddenly be aware of the fact that a fictional
script got included, and perhaps they might conclude that they should
submit their own pet scripts as well.

Thomas, if a script like Tengwar, which has thousands of users who are
actually interested in writing texts in it, sorting, searching, and all
that, gets into the UCS it is because there is a credible requirement to
encode it. Plenty of "nonfictional" historical scripts have fewer users
than Tengwar. For some of them we have a handful of texts. Tengwar on the
other hand is studied by linguists, used by enthusiasts, and at any rate is
an integral part of the work of one of the 20th century's finest and most
influential writers.

Many people with very real scripts
that they use in their daily lives were not aware of Unicode or that it
would benefit them to have them encoded; I suspect the same is true for
creators of fictional and constructed scripts.

Yes, of course.

For example, it is easy to
find a variety of fonts for fantasy runes or other alphabets that people
have created, some based off a description in published fiction, but they
have not gotten in touch with CSUR.

Actually there aren't all that many.

Or take the case of the Hotsuma
Tsutae syllabary, created in modern times to provide an fictional
pre-Chinese writing system (http://www.jtc.co.jp/hotsuma/index-e.htm) for
what is supposedly Old Japanese, which has books and articles published
about it, and fonts in existence, but it has no contact with CSUR.

In fact, I *have* seen this. As I recall Ken Whistler and I looked at it
when we were at the WG2 meeting in Fukuoka.

Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein ochtarach; Baile tha Cliath 2; ire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Pirc an Fhithlinn;  Baile an Bhthair;  Co. tha Cliath; ire





Re: ConScript registry?

2001-01-31 Thread P. T. Rourke

I'm curious: what are the historical scripts that have been proposed to
Unicode that only exist in a handful of documents (note that I define
handful as 20 or less)?  Other than the Phaistos Disk "script," which may
not be a script at all (it seems odd that there would be a script in as
heavily studied a location as the Aegean with only one example; it probably
is a script, but I would say that the jury is still out).

Patrick Rourke
[EMAIL PROTECTED]




Re: ConScript registry?

2001-01-31 Thread Michael Everson

Ar 05:46 -0800 2001-01-31, scrobh P. T. Rourke:
I'm curious: what are the historical scripts that have been proposed to
Unicode that only exist in a handful of documents (note that I define
handful as 20 or less)?

Proto-Sinaitic, for instance. Possibly some of the badly-known South
American scripts like Paucartambo. There are some scripts whose names keep
getting repeated in the literature but for which it's almost impossible to
get any samples at all.

Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein ochtarach; Baile tha Cliath 2; ire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Pirc an Fhithlinn;  Baile an Bhthair;  Co. tha Cliath; ire





Re: ConScript registry?

2001-01-31 Thread Thomas Chan

On Wed, 31 Jan 2001, Michael Everson wrote:

 Ar 13:23 -0800 2001-01-30, scríobh Thomas Chan:
 I don't think that CSUR is conclusive proof that there wouldn't be a
 deluge of demands for encoding fictional or constructed scripts if the
 likes of Tengwar or Klingon were encoded.
 
 Well, I think what David was saying is that there don't seem to be all that
 many of them.

My primary objection was that we don't have conclusive evidence for either
scenario.

 
 CSUR is just a pair of websites
 without nowhere the high profile nor authority of Unicode.
 
 I thought one of the Unicode web pages linked it. I could be wrong. And the
 CSUR states explicitly that it is just for fun. Having said that, I do know
 of some folks who have done implementations of one sort or another based on
 its specifications.

I don't see any wording along the lines of "just for fun" on either CSUR
website itself, except for a link on your http://www.egt.ie/sc2wg2.html
page.

The only thing that suggests their unofficialness and volatility is
mention of the Private Use Area, but perhaps that is not clear to
people who see the words "Unicode" and "Registry", and think it is the
real thing, or there are problems comprehending the concept of a Private
Use Area.  Or perhaps they have heard about it secondhand.  For example,
look through the Usenet newsgroup archives at deja.com or any discussion
board online and see how often people believe Klingon is in Unicode, or
"going to be in the next version" of Unicode, when there has only a
proposal.  (And I doubt they are looking at the WG2 proposal itself, but
the CSUR registration or derivative information.)

 
 If say, a
 fictional script were included and published by Unicode and ISO, then
 people all over would suddenly be aware of the fact that a fictional
 script got included, and perhaps they might conclude that they should
 submit their own pet scripts as well.
 
 Thomas, if a script like Tengwar, which has thousands of users who are
 actually interested in writing texts in it, sorting, searching, and all
 that, gets into the UCS it is because there is a credible requirement to
 encode it. Plenty of "nonfictional" historical scripts have fewer users
 than Tengwar. For some of them we have a handful of texts. Tengwar on the
 other hand is studied by linguists, used by enthusiasts, and at any rate is
 an integral part of the work of one of the 20th century's finest and most
 influential writers.

Please note that I did not single Tengwar out for criticism.  I believe it
has a valid argument to be encoded because of the size of the user
community.  It is the fictional scripts with small user communities that
are the problem, and how that relates to treatment of real-world
historical scripts with small user communities.

 
 For example, it is easy to
 find a variety of fonts for fantasy runes or other alphabets that people
 have created, some based off a description in published fiction, but they
 have not gotten in touch with CSUR.
 
 Actually there aren't all that many.

Are we sure about this?  It remains to be examined how they would be
treated, but there are Chinese fictional scripts that have the potential
capability of gobbling up codepoints like "ideographs" have done.  e.g.,
  http://deall.ohio-state.edu/grads/chan.200/misc/100fu.jpg
  http://deall.ohio-state.edu/grads/chan.200/misc/100shou.jpg
each show a single character in what are supposedly a hundred different
scripts.  Most of these "scripts" could probably be conflated and treated
as font variants, but a few are distinct.  Multiply that by 4000-8000
each, and you might have an explosion.

Or take the case of bunch of obsoleted reformist alphabets and
syllabaries of the late-19th and early 20th century, such as the Guanhua
Zimu ("Mandarin letters") alphabet, which is to my knowledge only
partially described in one Western source.  If I understand correctly,
these would be in the same category as Deseret or Visible Speech.


 Or take the case of the Hotsuma
 Tsutae syllabary, created in modern times to provide an fictional
 pre-Chinese writing system (http://www.jtc.co.jp/hotsuma/index-e.htm) for
 what is supposedly Old Japanese, which has books and articles published
 about it, and fonts in existence, but it has no contact with CSUR.
 
 In fact, I *have* seen this. As I recall Ken Whistler and I looked at it
 when we were at the WG2 meeting in Fukuoka.

How did that discussion turn out?


Thomas Chan
[EMAIL PROTECTED]




Re: ConScript registry?

2001-01-31 Thread John Jenkins


On Wednesday, January 31, 2001, at 06:14 AM, Michael Everson wrote:

 Ar 05:46 -0800 2001-01-31, scrobh P. T. Rourke:
 I'm curious: what are the historical scripts that have been proposed to
 Unicode that only exist in a handful of documents (note that I define
 handful as 20 or less)?

 Proto-Sinaitic, for instance. Possibly some of the badly-known South
 American scripts like Paucartambo. There are some scripts whose names 
 keep
 getting repeated in the literature but for which it's almost impossible 
 to
 get any samples at all.


Well, the best example of this sort of thing is the Phaistos disk 
script, which Michael and I have independently proposed. The entire 
corpus of known writings in this script was included in the proposal, 
and half of the corpus is found on your Unicode CD.  Literally "on".




Re: ConScript registry?

2001-01-31 Thread P. T. Rourke

Thanks, but if you go back and read my original message, you'll find the
following sentences that continue from the point quoted by Mr. Everson:

 Other than the Phaistos Disk "script," which may not
 be a script at all (it seems odd that there would be a
 script in as heavily studied a location as the Aegean
 with only one example; it probably is a script, but I
 would say that the jury is still out).


- Original Message -
From: "John Jenkins" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Wednesday, January 31, 2001 10:54 AM
Subject: Re: ConScript registry?



On Wednesday, January 31, 2001, at 06:14 AM, Michael Everson wrote:

 Ar 05:46 -0800 2001-01-31, scrobh P. T. Rourke:
 I'm curious: what are the historical scripts that have been proposed to
 Unicode that only exist in a handful of documents (note that I define
 handful as 20 or less)?

 Proto-Sinaitic, for instance. Possibly some of the badly-known South
 American scripts like Paucartambo. There are some scripts whose names
 keep
 getting repeated in the literature but for which it's almost impossible
 to
 get any samples at all.


Well, the best example of this sort of thing is the Phaistos disk
script, which Michael and I have independently proposed. The entire
corpus of known writings in this script was included in the proposal,
and half of the corpus is found on your Unicode CD.  Literally "on".





Re: ConScript registry?

2001-01-31 Thread John Jenkins


On Wednesday, January 31, 2001, at 08:21 AM, P. T. Rourke wrote:

 Thanks, but if you go back and read my original message, you'll find the
 following sentences that continue from the point quoted by Mr. Everson:

 Other than the Phaistos Disk "script," which may not
 be a script at all (it seems odd that there would be a
 script in as heavily studied a location as the Aegean
 with only one example; it probably is a script, but I
 would say that the jury is still out).


You are of course correct.  In my eagerness to point out that the entire 
Phaistos repertoire is included in the encoding proposal, I read too 
hastily.  My apologies.




Re: ConScript registry?

2001-01-31 Thread Michael Everson

The Phaistos disk is either a sample of writing or it is a board game. But
as a board game it doesn't look very interesting.

Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein ochtarach; Baile tha Cliath 2; ire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Pirc an Fhithlinn;  Baile an Bhthair;  Co. tha Cliath; ire





Re: ConScript registry?

2001-01-31 Thread Michael Everson

Ar 08:21 -0800 2001-01-31, scrobh P. T. Rourke:
Thanks, but if you go back and read my original message, you'll find the
following sentences that continue from the point quoted by Mr. Everson:

 Other than the Phaistos Disk "script," which may not
 be a script at all (it seems odd that there would be a
 script in as heavily studied a location as the Aegean
 with only one example; it probably is a script, but I
 would say that the jury is still out).

The sample we have pf Phaistos is, at least, well-designed, clear, and
easily analyzable. Meaning, at least it's not a rumour.

Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein ochtarach; Baile tha Cliath 2; ire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Pirc an Fhithlinn;  Baile an Bhthair;  Co. tha Cliath; ire





Phaistos Disk (was Re: ConScript registry?)

2001-01-31 Thread P. T. Rourke

Sure enough.  And I'm certainly never going to criticize someone for
treating it as a script until it is proven otherwise - including for the
purposes of Unicode.  But one has to admit that one excellent piece of
evidence that a script is a script is the existence of multiple texts, and
that in this case that excellent piece of evidence happens to be missing.
Not to say that there isn't evidence other evidence that it is an example of
a lost script (the fact that the characters seem to have been imprinted with
some sort of stamp, for instance, which is suggestive that there are in fact
multiple texts in the script, and the rest are lost).  Another possible
explanation of the disk is that it is a "dancing-men" cipher of some sort
(though why a cipher would be imprinted on such a permanent medium is beyond
me).  I do not think there is anything controversial in expressing this
modicum of doubt.

Anyway, I'm glad that there are folks like Mr. Everson and Mr. Jenkins
willing to put in the time in to keep up activity on encoding historical
scripts.

Patrick Rourke
[EMAIL PROTECTED]

Thanks, but if you go back and read my original message, you'll find the
following sentences that continue from the point quoted by Mr. Everson:

 Other than the Phaistos Disk "script," which may not
 be a script at all (it seems odd that there would be a
 script in as heavily studied a location as the Aegean
 with only one example; it probably is a script, but I
 would say that the jury is still out).

The sample we have pf Phaistos is, at least, well-designed, clear, and
easily analyzable. Meaning, at least it's not a rumour.

Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein ochtarach; Baile tha Cliath 2; ire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Pirc an Fhithlinn;  Baile an Bhthair;  Co. tha Cliath; ire






Re: ConScript registry?

2001-01-30 Thread David Starner

On Tue, Jan 30, 2001 at 11:02:29AM -0800, Elaine Keown wrote:
 Hello, 
 
 What's the ConScript registry?  

The ConScript registry (http://www.egt.ie/standards/csur/index.html) is a 
place where constructed/artifical scripts can be registered in a way
that they can be publicially transfered (among those who recognize the
encoding, of course.)

 Does it have a formal relationship with Unicode?  Sounds like something designed to 
be used with the Private Use Area?

It doesn't have a formal relationship with Unicode, although it is being 
done by John Cowan and Michael Everson. It is allocations of characters in
the Private Use area.

It also is a 'proof' that there won't be a huge surge of constructed 
characters in Unicode if you let Klingon or Tengwar in. There is roughly
2000 characters encoded in the BMP Private-Use area, with another 6,000
in Plane 16. Even accepting them all, that would fit easily in the space 
that hasn't even been tenatively allocated in Plane 1's roadmap. Cowan and Everson 
have not been very picky about which scripts they included in the ConScript
registry. Of those in the registry, I would guess only 8 (Tengwar, Cirth,
Engsvanyali, Shavian, Solresol, Visible Speech, Aiha, and Klingon) have any
claim to be added to Unicode. 78 columns, less than 624 characters to be
added.

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org



Re: ConScript registry?

2001-01-30 Thread John Jenkins


On Tuesday, January 30, 2001, at 12:19 PM, David Starner wrote:

 Of those in the registry, I would guess only 8 (Tengwar, Cirth,
 Engsvanyali, Shavian, Solresol, Visible Speech, Aiha, and Klingon) have 
 any
 claim to be added to Unicode. 78 columns, less than 624 characters to be
 added.

Don't forget Deseret, which will, in fact, be part of Unicode 3.1.

(Shavian has also been accepted by UTC for encoding; it's just that 
nobody has really pushed on it so it's languished.)




Re: ConScript registry?

2001-01-30 Thread David Starner

On Tue, Jan 30, 2001 at 01:23:02PM -0800, Thomas Chan wrote:
 I don't think that CSUR is conclusive proof that there wouldn't be a
 deluge of demands for encoding fictional or constructed scripts if the
 likes of Tengwar or Klingon were encoded. 

This is real life; we don't get much conclusive proof around here. 

 If say, a
 fictional script were included and published by Unicode and ISO, then
 people all over would suddenly be aware of the fact that a fictional
 script got included, and perhaps they might conclude that they should
 submit their own pet scripts as well.  

"Their own pet scripts"? Since when does the works of the greatest
fantasy author of the 20th century used by thousands become "pet scripts"?

On a calmer note, how many script submissions does Unicode and the 
ISO 10646 working group get now? How about from people outside Unicode
and the working group? What about outside the standards bodies? 

If my guess is right, there's very few submissions from outside Unicode,
and really no evidence that this would pick up significantly after Tengwar
or Klingon got encoded.

 Many people with very real scripts
 that they use in their daily lives were not aware of Unicode or that it
 would benefit them to have them encoded; I suspect the same is true for
 creators of fictional and constructed scripts.  

("very real" and "fictional and constructed" not being disjunctive, of 
course.) True. And?

 For example, it is easy to
 find a variety of fonts for fantasy runes or other alphabets that people
 have created, some based off a description in published fiction, but they
 have not gotten in touch with CSUR.  

But those are the marginal cases that Unicode doesn't need to worry
about. They won't mess with Unicode, either. They aren't going to
be interested or patient enough to fill out the forms. 

 Or take the case of the Hotsuma
 Tsutae syllabary, created in modern times to provide an fictional
 pre-Chinese writing system (http://www.jtc.co.jp/hotsuma/index-e.htm) for
 what is supposedly Old Japanese, which has books and articles published
 about it, and fonts in existence, but it has no contact with CSUR.

Unsurprisingly, the CSUR covers Western scripts better than Eastern
ones. You could probably know better than I do how many Eastern fictional
scripts there are. Even with that, is it right not to encode one language
that deserves it, because there may be more that deserve to be encoded,
or for fear (on what evidence) of spurious submissions?

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org



Re: ConScript registry?

2001-01-30 Thread DougEwell2

In a message dated 2001-01-30 15:29:04 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

  For example, it is easy to
   find a variety of fonts for fantasy runes or other alphabets that people
   have created, some based off a description in published fiction, but they
   have not gotten in touch with CSUR.  
  
  But those are the marginal cases that Unicode doesn't need to worry
  about. They won't mess with Unicode, either. They aren't going to
  be interested or patient enough to fill out the forms. 

Right at the moment I am trying to get my own constructed script encoded in 
CSUR.  Although I would be perfectly willing to fill out the paperwork 
required by Unicode and ISO/IEC JTC1/SC2/WG2 (it's not that much really), I 
would never actually do so because the script simply doesn't belong in 
Unicode/10646.  They know it and I know it.

There are plenty of differences between Unicode and CSUR, as others have 
mentioned.  Unicode is well known and getting better known every day; CSUR is 
relatively obscure except to Unicode insiders, people whose scripts have 
already been encoded, and probably some who stumble across John's or 
Michael's web sites by chance.  CSUR is intentionally focused on recently 
constructed scripts (as I have suggested before, all scripts are "artificial" 
or "constructed" but some gain wider acceptance than others) and so it 
naturally contains some scripts of extremely limited use that would not be 
candidates for encoding in Unicode.  I trust Unicode and WG2 not to accept 
just any old script.

Many of the "proposed" but not yet "registered" CSUR scripts were invented by 
one guy whose hobby is creating fantasy worlds, languages, and scripts.  I 
figure my script is just as worthy even though there is no fantasy world 
created around it.

-Doug Ewell
 Fullerton, California