Re: IUC27 Unicode, Cultural Diversity, and Multilingual Computing / Africa is forgotten once again.
John H. Jenkins a écrit : On Dec 8, 2004, at 3:57 PM, Patrick Andries wrote: Azzedine Ait Khelifa a écrit : Hello All, The subject of this conference is really interesting and veryusefull. But once again Africa is forgotten. I want to know, if we can have the same conference AfricaOriented scheduled ? If Not, What should we do to have this conference scheduled in a cityaccesible for african community (like Paris). If this is possible, I would also add « and with much more contents ina language understood in Africa and the host country : French ». Well, and as with everything else associated with Unicode, feel free to volunteer. Well, I do volunteer work...in English and in French (there are other fora where people talk about Unicode in French whether in Morocco or Lebanon for instance). Merci du conseil et salutations cordiales, P. A.
Re: Pour sauver la patrimoine de l'Imprimerie Nationale de France
Michael Everson a écrit : Voir http://www.garamonpatrimoine.org/ Note the use of Unicode in http://www.garamonpatrimoine.org/petition.html P. A.
[Fwd: Re: Re: Relationship between Unicode and 10646]]
Message original Sujet: Re: Re: Relationship between Unicode and 10646] Date: Mon, 29 Nov 2004 10:17:34 +0100 De: Philippe Verdy [EMAIL PROTECTED] From: "Patrick Andries" [EMAIL PROTECTED] Enfin, je ne suis plus si sr que les socits amricaines considrent encore Unicode comme quelque chose de stratgique, il s'agit surtout d'efforts individuels de la part de techniciens passions dans ces entreprises, passionns qu'on laisse encore faire sans doute parce que cela cre un bon capital de sympathie multiculturel. [PA] This was extracted from a longer and private message to Philippe. It is out of context here. Unicode is still strategic, the new scripts may be less so to the major software companies although major software companies will most probably not be able to ignore the new versions of Unicode which will contain more than simply new rare scripts. Anyways, this was a private discussion. Thanks, Philippe. Will teach me. P. A.
Value of U+1E20
Would any one know what is the value of U+1E20 ? Is this (also) used in Semitic transliterations ? For which value ? Could it be a fricative G ? Many thanks, P. A.
Re: Unicode V4 and ISO
Martine Brunet a crit: Hello, I am new on this list and I have a question about very special characters and the standard Unicode v4. I sought much the answer to this question at www.unicode.org but without success. Can somebody say to me if the characters of the 4 following standards ISO 5426 -2:1996, ISO 6861:1996, ISO 8957:1996 and ISO 10754:1996 are integrated in Unicode V4? In detail, they are the following standards : - ISO 5426-2: 1996 Information and documentation - Extension of the Latin alphabet coded character set for bibliographic information Interchange - Part 2: Latin characters used in minor European languages and obsolete typography - ISO 8957:1996 Information and documentation - Hebrew alphabet coded character sets for bibliographic information interchange. - ISO 10754: 1996 Information and documentation - Extension of the Cyrillic alphabet coded character set for non-Slavic languages for bibliographic information interchange. - ISO 6861: 1996 Information and documentation Glagolitic alaphabet codes character set for bibliographic information interchange I believe the place to look at is here : http://www.unicode.org/versions/Unicode4.0.0/References.pdf At first sight, these all served as sources references to ISO 10646, except the Glagolitic whose script is part of Amd 1 to ISO 10646:2003 (an upcoming version of Unicode, coming after 4.0 thus). I believe (but I have not studied this in depth) that the Amd1 proposal differs slighly from ISO 6861 in as far as some glyph variants from ISO 6881 are not proposed in Amd 1. Cordialement, P. Andries - o - O - o - ISO 10646 et Unicode en franais http://pages.infinit.net/hapax
Re: Errors in TUS Figure 15.2?
Doug Ewell a crit: Peter Kirk peterkirk at qaya dot org wrote: The situation is even more confused in that some Unicode characters, e.g. U+0152 LATIN CAPITAL LIGATURE OE, are called LIGATUREs in their character names but are unambiguously single Unicode characters (e.g. they have no decomposition even for compatibility). (These are in addition to the characters named LIGATURE in the Alphabetic Presentation Forms block, which mostly have compatibility decompositions.) The last thing you want to worry about is the correlation between whether a character has the word LIGATURE in its name and whether it is actually a ligature. That way lies madness. [PA] Incidentally, the French version of ISO 10646 does not name these letters LIGATURE, but DIGRAMME SOUD (e.g. U+0152 : DIGRAMME SOUD MAJUSCULE LATIN OE). Also, the Unicode 1.0 name may have been better in this regard : LATIN CAPITAL LETTER O E . P. A.
Re: Much better Latin-1 keyboard for Windows
Mike Ayers a crit: RE: Much better Latin-1 keyboard for Windows [Alain] As I said in my previous mail, these definitions are not the best of definitions. The distinction is but intuitive, you have to see the diagrams where labeling makes the difference: SNIP/ I don't have these diagrams. Are they published somewhere public? The only one I know that don't infringe copyright (because never as yet published) is here : http://www.cooptel.qc.ca/~pandries/ISO-CEI%209995-1-1994.pdf I believe Alain was refering to figures 8 and 9 (end of document). P. A. - o - O - o - ISO 10646 et Unicode en franais http://pages.infinit.net/hapax
Re: Changing UCA primar[l]y weights (bad idea)
Alain LaBonté a écrit : It would be much better to make sorting, matching and searching consistent with tailored tables of either the UCA or ISO/IEC 14651. Unfortunately that is not what happens in most products, except in some good search engines (Google, Altavista and the like, which are smart enough for this -- but are not tailorable, to my knowledge -- and there are slight differences in behaviour between Google and Altavista although it is very much better that Mozilla or MS products in all cases). [PA] Sometimes too smart when one wants to search a word with an accent and not find the far more numerous forms without it. A small check mark (ignore diacritics) would be welcome. (Anybody from Google reading the list ?) P. A.
Re: Arabic written in Syriac? Arabic written in Tifinagh?
E. Keown a écrit : Elaine Keown Tucson Hi, I'm trying to track down a reference for Arabic written in Syriac (by Syriac Christians). Well, the keyword « Garshuni » may help here. I did a little work on Tifinagh 2-3 years ago. I discovered that it is used to write Arabic by Tuareg women.I hope that the Moroccan Tifinagh proposal includes those characters, if they are 'extras.' Do you have any letters in mind ? Some such letters could very well be missing P. A.
Re: Arabic written in Syriac? Arabic written in Tifinagh?
E. Keown a écrit : Aha!--thank you. Is there much Garshuni material, some especially notable? A recent (may 2004) communication and references to Garshuni manuscripts : 17h15 Élie Kallas (Trieste) /Le type linguistique garchouni du Mont-Liban (15^ème siècle) d'après les mss. Vat ar. 640 et Borg. ar. 136 d'Ibn el-Qila-^c i-./ http://www.fltr.ucl.ac.be/FLTR/GLOR/ORI/ColloqueArabe/programmeF.htm « Danach widmete Naoum Faik seine Zeit der eigenen Zeitschrift »Bethnahrin». Die Besonderheit der Publikationen von Naoum Faik war, dass die Beiträge in türkischer bzw. arabischer Sprache jedoch in Syro-Aramäischen Alphabet. Dieser Stil ist u.a. als Garschuni bekannt und war vor und nach dem I. Weltkrieg vor allem innerhalb des Intellektuellenkreises, die im Osmanischen Reich lebten, weit verbreitet. » http://www.bethil-online.com/magazines/rh_2003/rh-61.pdf So it seems like it was quite common in the Ottoman Empire before and after WWI among intellectual circles. I think Google (English, French and German) will reveal a wealth of material or citations to material. Tifinagh is used to write Arabic by Tuareg women.I hope that the Moroccan Tifinagh proposal includes those characters.. Patrick Andries wrote: Do you have any letters in mind ? Some such letters could very well be missing I did have a short list of such Tifinagh characters--6 or fewerfrom 3 years ago.but the U.S. Post Office lost two of my boxes this spring, and the Arabic- etc notes were in the box that's still heaven-knows-where. Kamal Mansour had a copy of my Arabic-script bibliography, but I am not sure that the Tifinagh material was on that. I know of a least one such a letter by memory (because it is easy to remember) : a rectangle for emphatic s. But it is debatable (only Hanoteau gives it, I think) and thus was not a priority to code in our first (modern-day) Tifinagh proposal. But Tifinagh is actually a really important script---it's used to write many major dialects, though maybe more by womenand it's caseless, so the collation string can have the variants inserted in the regular string of letters I'm not sure I understand. P. A.
Re: Looking for transcription or transliteration standards latin- arabic
Peter Kirk a crit : On 07/07/2004 07:08, Raymond Mercier wrote: This is a possible derivation. If this is Gerd's source, he failed to make the point that istimboli was not a Greek name of the city but a colloquial pronunciation of a phrase. And the source of that may be the following old German text, from http://www.staff.ncl.ac.uk/jon.west/get/hc0144_3.htm: Constantinopel hayssen die Chrichen Istimboli und die Thrcken hayssends Stambol; And according to http://www.fotoist.8m.com/ad.htm (in Turkish) this information comes the from 14th-15th century German traveller Johan Schildtberger. But I have my suspicions about this information. The Greeks had no problem with initial consonant clusters but the Turks did, so it is much more likely that the Turks added the initial I to a Greek word starting with ST, just as Spanish and French add initial E before such clusters. French (for the last 5 centuries) no longer adds an initial E in front of ST (see : stop, start, sport (*), stage, stature, station, etc.), historically (in Old French) this was true (estable [stable], estamper [to stamp], estat [state, station], esterlin [sterling], estrange [stange, stranger]). Old French is before the fall of Constatinople and the end of the Hundred Year war (both in 1453 as all French-speaking schoolchildren learn). Spanish still does (or a least did recently) see recent loanwords : esqu (ski) or esprint (sprint). P. A. (*) English word derived from an Old French word desport / deport (entertainment), see deporte in Spanish and desporto/desporte in Portuguese (but esporte in Brazil). .
Re: How to find character corresponding to code
Mike Ayers a crit : From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of [EMAIL PROTECTED] Say, I have given a 2-Byte Unicode character code. How can I quickly find out, how the corresponding character *should* look like according to the standard? From the Unicode standards page (FAQ and Search), it seems that it is easy to find the code point, when one knows the character name. I would like to do the reverse, though. Use the code charts: http://www.unicode.org/charts/ As you hold the mouse over each link, look at the status bar of your browser which shows the link name. You will see the final part of the link name is U followed by hex digits followed by .pdf. The hex digits are the first codepoint in that block. The charts are in ascending order - top to bottom, left to right. Once you find the chart you want, finding the character should be no problem. [PA] Personally, I often use Babelmap and Code 2000 as default font, easy to see the character properties and come with English or a French UI with corresponding character names. Also nice to test the script, cut and paste the characters, etc. http://uk.geocities.com/BabelStone1357/Software/BabelMap_fr.html http://uk.geocities.com/BabelStone1357/Software/BabelMap.html P. A.
Re: Looking for transcription or transliteration standards latin- arabic
Peter Kirk a crit : On 03/07/2004 00:07, Patrick Andries wrote: o very different political realities (before and after 1453). Cities change names without going through transliterattions, cf. Berlin (Ontario) becoming Kitchener in 1916. But Constantinople - Istanbul is not in fact this kind of name change, for Istanbul (that is, stanbul) is probably a corrupted and shortened version of Constantinople, with the initial I added to fit Turkish phonology (cf. the old western version Stamboul, still used in Russian, also Smyrna - Izmir). (I have also heard it said that Istanbul comes from Greek EIS TN POLIN to the city, but that seems less likely to me.) Yes, I have heard this. So the change is more like Beijing - Peking than Berlin - Kitchener. Without a political change Constantinople would not have changed name in a matter of days (at least as far as the officials were concerned). In any case, it is not a transliteration problem (Beijing -- Pkin). P. A.
Re: Looking for transcription or transliteration standards latin- arabic
Patrick Andries a crit : So the change is more like Beijing - Peking than Berlin - Kitchener. Without a political change Constantinople would not have changed name in a matter of days (at least as far as the officials were concerned). In any case, it is not a transliteration problem (Beijing -- Pkin). [PA] I wrote this a bit too fast this morning (first message !). I believe the origin of Istanbul is a bit too obscure to decide whether it is due to a transcription or a complete name change. Just to confuse things further Konstantaniye was apparently used by the Turkish administration and a Greek form Istimboli is attested in the XIVth century. P. .A
[OT] Dutch letters was [Fwd: Re: is n with tilde used in French language ?]
Patrick Andries a écrit : http://www.evertype.com/alphabets/french.pdf Several remarks : ü seems not be be listed (see « würmien », « le würm », « argüer» now acceptable according to a recent spelling reform). Population of France is now 61,7 millions (including around 1,7 millions French citizens in French overseas territories), but French is also the native tongue of populations in Belgium, Luxembourg and Switzerland (all in Europe). Haarmann 1993 figure of 58,1 millions was for Metropolitan France + overseas territories (1990 census). [PA] Incidently I notice contrarily to French the populations for Dutch and German speakers include the speakers of those languages in several country. Also for Dutch, I'm not convinced the list of letters is complete in http://www.evertype.com/alphabets/dutch.pdf Most vowels could take an acute accents I believe : attaché, logé (French words), dóórdringen, géén, búíten, drááien (stressed syllables, cf. http://www.geocities.com/tinnestaaltroep/tinnepick.html, stressing words graphically is common and (much) more frequent than in written English while stressing words by adding accents is about completely absent in French). The circumflex also is used : enquête, gêne, fêteren As well as è in scène (in my Kramers Nederlands-Frans dictionary). http://www.e-klas.net/ns/nlspelling.htm#acc http://www.geocities.com/tinnestaaltroep/tinneaccentframe.htm
Re: Looking for transcription or transliteration standards latin- arabic
Philipp Reichmuth a crit : Except there is no v sound, only an f sound in the Russian pronunciation of due to regressive assimilation. Chykoffskee is pretty accurate, actually. I'd say Tchaikovsky is just a spelling taken over from French at a time when French was pretty much the international common language at least in diplomacy and art. [PA] And the prevalence of French in the Russian imperial nobility. In French it is today Tchakovsky (with trma), but the v looks like an attempt to transliterate, Russian names written in French in the XIXth century would usually transcribe as ff : boeuf Strogonoff, Michel Strogoff (Jules Verne), *Princesse Demidoff* ne Strogonoff, Tchkoff as an migr name in France [2 born in Paris between 1916 and 1940].
Re: is n with tilde used in French language ?
Cristian Secar a crit : According to Michael Everson's site, The Alphabets of Europe page, the French .pdf, character and (Latin small / capital letter N with tilde) is used by the French alphabet. Not any alphabet taught in primary school I would say. But caon is in my Petit Larousse illustr (2004), but then it refers the reader to the more common canyon... I looked at different other sources and found no other mention about this character as being used for French language (however, my search was not exhaustive). The standard ISO/IEC 8859-16 claims coverage of the French language, but character and is not part of ISO/IEC 8859-16. Should I understand that this charactere was only used in old French ? As a ligature certainly and it was also proposed and used by Renaissance orthographical reformers to denote unambiguously nasal sounds (I have several books from around 1550 using the tilde in that fashion [facsimiles of such books of course...]). Patrick - o -O - o - ISO 10646 et Unicode en franais http://pages.infinit.net/hapax
[Fwd: Re: is n with tilde used in French language ?]
Message original Sujet: Re: is n with tilde used in French language ? Date: Sun, 4 Jul 2004 21:31:28 +0100 De: Michael Everson [EMAIL PROTECTED] Pour: [EMAIL PROTECTED] [EMAIL PROTECTED] Références: [EMAIL PROTECTED] At 21:50 +0300 2004-07-04, Cristian Secara~ wrote: According to Michael Everson's site, The Alphabets of Europe page, the French .pdf, character ñ and Ñ (Latin small / capital letter N with tilde) is used by the French alphabet. The reason it is in that list is because there are some loanwords in French which retain the letter. Cañon is one of these. http://www.evertype.com/alphabets/french.pdf Several remarks : ü seems not be be listed (see « würmien », « le würm », « argüer» now acceptable according to a recent spelling reform). Population of France is now 61,7 millions (including around 1,7 millions French citizens in French overseas territories), but French is also the native tongue of populations in Belgium, Luxembourg and Switzerland (all in Europe). Haarmann 1993 figure of 58,1 millions was for Metropolitan France + overseas territories (1990 census). [1] http://www.insee.fr/fr/ffc/pop_age4.htm
Re: Mandombe
Anto'nio Martins-Tuva'lkin a écrit : Anyway, no clear indication on which language or languages is supposed to be served by this script -- though it seems to be aimed for Bantu languages, perhaps kiKongo (where ombe means black). It apparently means (in kiKongo) the Black people's own or For the Black People (ma Ndombe = Celle des noirs / Propre au peuple noir/ Pour les noirs). It is basically a script promoted by a Church (rather important one), a bit like Deseret. The Église kimbanguiste (officially Église de Jésus-Christ sur Son Envoyé spécial Simon Kimbangu -- EJCSK) around 6 million members. mainly in RDC Congo. Samples : http://perso.wanadoo.fr/kimbangu.net/public1.htm Two books I know of : * WABELADIO PAYI D., 1996, Mandombe, Ecriture Négro-africaine : manuel d'apprentissage à l'usage des apprenants, Edition du CENA, RDC, 65 pages. è Résumé * LOUTHES A., To tanga Mandombe - Manuel de lecture aux apprenants de l'écriture négro-africaine, Edition du CENA, 60 pages And two dissertations : MALUEKI MATUASILUA S.H., 2000, *L'impact de l'Ecriture Négro-africaine « *Mandombe* » dans le développement - Cas de quelques exemples à Kinshasa*, Mémoire de fin de cycle de Technicien en Développement Rural, Institut Supérieur de Développement Rural, Luozi, Bas-Congo, RDC. è *Résumé * * LUSIKILA Kueno Buayi J.P., 1998, *L'Ecriture *Mandombe*. Essai de signification theologico-sapientiale et culturelle*, Mémoire de fin d'études de Licence en théologie, Université Simon Kimbangu, Kinshasa, RDC P. A.
Re: Mandombe
Patrick Andries a écrit : Anto'nio Martins-Tuva'lkin a écrit : Anyway, no clear indication on which language or languages is supposed to be served by this script -- though it seems to be aimed for Bantu languages, perhaps kiKongo (where ombe means black). It apparently means (in kiKongo) the Black people's own or For the Black People (ma Ndombe = Celle des noirs / Propre au peuple noir/ Pour les noirs). It is basically a script promoted by a Church (rather important one), a bit like Deseret. The Église kimbanguiste (officially Église de Jésus-Christ sur Son Envoyé spécial Simon Kimbangu -- EJCSK) around 6 million members. mainly in RDC Congo. Finger slipped : « Église de Jésus-Christ sur Terre par Son Envoyé spécial Simon Kimbangu ». http://www.quid.fr/2000/Q014770.htm P. A.
Re: Mandombe
Michael Everson a écrit : At 07:00 -0400 2004-07-02, Patrick Andries wrote: It is basically a script promoted by a Church (rather important one), a bit like Deseret. It is a pretty dreadful writing system. I find it hard to believe that anyone could actually read it or that anyone actually learns it. I did photocopy Payi's book (46 pp) but the script's structure is not really explained well enough to do a ConScript registry for it! I have contacted the Church to see if I could get more details. P. A.
[totally OT] Mohawk, Re: Looking for transcription or transliteration standards latin- arabic
Mike Ayers a crit : From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Chris Harvey Sent: Friday, July 02, 2004 11:17 AM Perhaps one could think of Ha Tinh as the English word for the city, like Rome (English) for Roma (Italian), or Tokyo (English) for Tky (English transliteration of Tky is not an English transliteration of Japanese, as it uses diacritics not found in English. The correct English transliteration is in fact Tokyo, which does not round trip. Japanese), or Kahnawake (English/French) for Kahnaw:ke Errr - didn't the Emglish/French useage predate the Mohawk alphabet? Pretty perverse case there. Yes, the Mohwak alphabet certainly postdates the French transcriptions. Just a few pieces of information about Mohawk (Agnier in its traditional French form) names around Montreal (Kanesatake North Shore, Kahnawake South Shore) : 1) Heard one of the Mohawk leaders speak on the radio the other day and he pronounced the K of Kanesatake as Kansatgu for my French ear, which seems to be validated by the old French spelling Canessedage (first attested in 1695), the name was first used apparently when the Agniers found refuge at the foot of Mont Royal on Montral Island than already occupied by the French for quite a time before the Sulpicians moved them to another area ouside Montreal. The French adopted Oka (an Algonquian name, if I recall properly) to designate the same place the Mohawk named Kanesatake. 2) As far as Kahnawake is concerned the settlement occurred again while the French had settled the area (long story but the small group of Mohawk that had converted to Catholicism and found refuge around Montreal went through several settlements before settling in Kahnawake), at the same time the priests and French settlers that accompagnied the Mohawk called the place (now Kahnawake) Saint-Franois-Xavier-du-Sault or simply Le Sault. In Mohawk (agnier) the present-day Kahnawake was respectively called Kahnawake ( au rapide , by the rapids ), in 1676, Kahnawakon, ( dans le rapide , in the rapids ), in 1690, Kanatakwenke, ( d'o on est parti , whence we left ), in 1696 and Caughnawaga, in 1716 and many other spellings thereafter until 1980 when Kahnawake was chosen as the official spelling. P. A.
Re: Looking for transcription or transliteration standards latin- arabic
Jony Rosenne a crit : -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John H. Jenkins Peking for Bejng. :-) Or Constantinople for Istanbul. :-) Two very different political realities (before and after 1453). Cities change names without going through transliterattions, cf. Berlin (Ontario) becoming Kitchener in 1916. In any case, it is Istamboul and Pkin. P. A.
[OT] Re: Still some educational work to do
Ted Hopp a écrit : I was listening to that program, too. When I heard the explanation of Unicode, I turned off the radio. :( [PA] This kind of experiences always makes me wonder how much « misinformation » I'm listening to or viewing on subjects about which I know less... P. A.
Re: Thrilling varia from the Library of Congress
Michael Everson a écrit : Found a book on the Tulu script. Found some of Doke's 1925 phonetic characters cited in a 1975 source. If a few citations of author specific characters are enough are sufficient for encoding I have a few more characters to propose Note : I don't know which I really prefer (encode this kind of rare characters or not).
Re: Still some educational work to do
Michael Everson a écrit : At 11:03 -0500 2004-06-30, Donald Z. Osborn wrote: The flip side of this issue, which came up in the letter from the person who was just in Ouaga, is a question: what sort of African and other non-Western representation is there on the Unicode consortium? People like me take an interest; and the Agence intergouvernmentale de la francophonie has joined the Consortium recently. Canada and France (and Morocco) at the ISO level also take an interest and we have been in contact with the different centers mentioned by Don, sometimes for several years. We have also successfully proposed Tifinagh a major script used in a large part of Africa (Morocco, Algeria, Tunisia, Libya, an oasis in Egypt, Mali, Niger and part of Burkina Faso,...). Patrick Andries - o - O - o ISO 10646 et Unicode en français http://pages.infinit.net/hapax
Re: Still some educational work to do
Donald Z. Osborn a écrit : And a lot more yet... In some parts of the world that could benefit most from actively working Unicode, such as much of Africa, there is still relatively little knowledge of it. Even among techies. In fact, there is still an undercurrent of dissatisfaction among some who know something about Unicode with aspects of how it provides for some African character needs. I was reminded of this by a letter I received not long ago from someone who attended a recent colloquium on ICT in Ouagadougou. Within the last year some of us began discussing possible conferences, workshops, training modules, or a road show on Unicode in Africa and perhaps other regions. Yes, we did and this in a language understood in the given country. I'm not sure a series of workshops in English in French-speaking Africa is for instance a good thing. A series of workshops (in French, Arabic or Berber) is planned for Morocco later this year on this subject (Unicode, multilingual documents and font technology). P. A.
Tifinagh (Projet de norme marocaine 17.1.100) (was Re: lines 05-08, version 4.7 of Roadmap to BMP and 'Hebrew extensions')
Marco Cimarosti a écrit : Rick McGowan wrote: I mistakenly thought Tifinagh was rtl. That's OK. It has been, and sometimes still is, written right to left, hence it was roadmapped in a right-to-left allocation block. However, in modern usage, and in the Moroccan national standard now being drafted, it is specifically left to right. Is the draft of this Moroccan standard on-line somewhere? TIA. _ Marco Ici : http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2739-1.pdf P. A. - o - 0 - o - ISO 10646 et Unicode en français http://pages.infinit.net/hapax
Re: Tifinagh and Roadmap
Marco Cimarosti a écrit : Is the draft of this Moroccan standard on-line somewhere? TIA. _ Marco Speaking of Tifinagh, I notice the block allocated to it has been modified but not the document referenced in it. See http://www.unicode.org/roadmaps/bmp/, row 2D. I believe it should point to http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2739 (as previously requested in Markham). If need be I could produce a revised version updating N2739 with the recommandations of the Tifinagh ad-hoc (code points and names modified) for easy reference. I have already done so for the French version of the proposal : http://cooptel.qc.ca/~pandries/propo_tifinagh.pdf P. A.
Re: Bantu click letters
Michael Everson a écrit : At 10:00 -0400 2004-06-10, John Cowan wrote: And today, if I were reprinting it, I'd commission a digital font (your effort, my expense) and put the characters in the PUA. Not if you wanted, as an Africanist, to be able to represent the text as it was originally written. Could you please explain this, how would using PUA characters prevent the text to be represented as it was originally written ? P. A.
Re: Bantu click letters
Patrick Andries a écrit : Michael Everson a écrit : Practice your tongue-twisting. Proposal to add Bantu phonetic click characters to the UCS http://www.evertype.com/standards/iso10646/pdf/n2790-clicks.pdf :-P Are these letters used in any other book than Doke's book on Kalahari Bushmen ? P. A. [PA] I don't think I got a direct answer on these non Bantu clik symbols being used in any other book. If these symbols are indeed used in a single book and by a single author, I would put them in the PUA, I don't see any interchange requirement to do otherwise. If letters unique to an author may now be encoded in Unicode, I have many to propose to the enabling technology that Unicode is and people will be free to use them or not. P.A.
Re: Bantu click letters
Michael Everson a écrit : Practice your tongue-twisting. Proposal to add Bantu phonetic click characters to the UCS http://www.evertype.com/standards/iso10646/pdf/n2790-clicks.pdf :-P Are these letters used in any other book than Doke's book on Kalahari Bushmen ? P. A.
Re: Phoenician, Fraktur etc
Peter Kirk a écrit : If Fraktur and ordinary Latin are the same script, then it couldn't be said that the Germans abandoned the Fraktur script after WWII. Yet, that is what available references say did happen. Fraktur was actually abandonned during the Nazi era. In an ordinance dated 3/I/1941, the NSDAP Reichleiter, Martin Bormann, on order from Adolf Hitler, describes the « so-called Gothic script » as the « Schwabacher Jewish letters », Antiqua (Latin) letters were to be used from then on and the script was to be called the « normal script ». On the party congress in 1934 in Nuremberg, Hitler already criticized the « Gothic script ». http://www.deutsche-schutzgebiete.de/fraktur.htm (transcript of the said ordinance). P. A.
Re: Multiple Writing Directions in One Script
Dean Snyder a écrit : Archaic Greek could be written right-to-left, left-to-right, or boustrophedon. I'm asking for technical advice as to how such variability in writing direction streams in the same script can be, and should be, handled in Unicode, and how it should be dealt with in a Unicode proposal. I believe is similar to what exists in Old Italic. Please refer to the Old Italic proposal. P. A.
Re: Multiple Writing Directions in One Script
Michael Everson a écrit : At 14:02 -0700 2004-05-25, Patrick Andries wrote: I believe is similar to what exists in Old Italic. Please refer to the Old Italic proposal. Old Italic is no longer a proposal. It has been encoded. I know, Michael. But there is still a document called the Old Italic Proposal (or whatever it was first called Etruscan, Osque, ...). No need to be picky, but helpful. Since Dean was looking for the way to address multiple writing directions in a proposal, I was suggesting him to read the Old Italic Proposal which lead into the encoding of Old Italic. He should find language there that should suit him since Old Italic shares similar properties. Do you have a pointer to this proposal (your,s I believe) ? This would have been helpful, but I see Ken has answered the question quite well. P. A.
Re: Response to Everson Phoenician and why June 7?
saqqara a écrit : I showed my 5 year old some Fraktur (lower case only) for the first time today. He is only just getting to grips with reading simple English words. And the verdict .. 'funny and silly' but he could still read the words back to me. Anecdotal perhaps but Dean, do you want me test the other 29 of his class at school before we can be rid of this fallacious Fraktur analogy? Try with Sütterlin also unified within Latin ;-) http://www.cooptel.qc.ca/~pandries/suetterlin.jpg (Sorry) P. A.
Re: Response to Everson Phoenician and why June 7?
Doug Ewell a crit : And when shown the Stterlin, he couldn't read it but certainly recognized it as handwriting. So would he when submitted with a Cyrillic handwriting ? P. A.
Inscription in Punic and Neopunic
Apparently the following book Kanaanische und aramische Inschriften, by H. Donner-W Rllig, Wiesbaden, 1962-64 (3rd edition 1971-1976) on page 161 (if I read properly the reference) contains a sample of an inscription that would be partly written in Punic and partly in Neo-Punic. I have been travelling for a week now and I'm estranged from all decent libraries. The inscription was found in Cherchel (Algeria) and is apparently dedicated to Micipsa. Would anyone have access to the aforementioned book ? Could that person be so kind as to see whether such an inscription is indeed illustrated ? Many thanks, P. A.
Re: Inscription in Punic and Neopunic
James Kass a crit : Patrick Andries wrote, The inscription was found in Cherchel (Algeria) and is apparently dedicated to Micipsa. Would anyone have access to the aforementioned book ? Could that person be so kind as to see whether such an inscription is indeed illustrated ? Is this it? First of all thank you. I believe this is not the inscription, it could be Cherchel N2. *Cherchel N 2* Berger 1889, pp. 35-46; Lidzbarski 1898, p. 439, 3 Dd2, Taf. xvi, 4; Van den Branden 1974, pp. 143-145; Garbini 1974b. p. 33; Roschinsky 1979, pp. 111-116; NSI 57; KAI 161. KAI = Kanaanische und aramische Inschriften and 161 is also the reference I have. Unfortunately no illustration is provided. P. A.
Re: Proposal to encode dominoes and other game symbols
John Hudson a écrit : Michael Everson wrote: Here. Chew on this. :-) N2760 Proposal to encode dominoes and other game symbols Michael Everson 2004-05-18 http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2760.pdf This could get out of hand very quickly. Chinese and Japanese (shogi) chess pieces? To complete U+2616 and U+2617 ? P. A.
Re: Phoenician and software development
saqqara a écrit : Unification of the Phoenician script with Hebrew would certainly eliminate some short term problems - the Hebrew script is fairly well supported nowadays among applications and we'd eliminate the Plane 1 issue. Terribly confusing to users however - the majority do not read Hebrew and we'd be back to hacks to prevent modern Hebrew fonts sneaking in. Unicode is not meant to be purely about fixing short term problems, rather a platform for moving forward. If many Israelis may not be able to read Phoenician or Neo-Punic, it is not obvious to me that Phoenician or Punic scholars -- presumably the intended users of Phoenician/Canaanite -- do not read Square Hebrew. I have some testimony to the opposite : Lionel Galand (Tifinagh expert) saying he has often seen Punic inscriptions represented using Square Hebrew characters, James Février (Punic expert) illustrating the Phoenician character names with Square Hebrew glyphs (and not Phoenician glyphs used in the previous pages), Dictionnaire de la civilisation phénicienne et punique http://www.amazon.fr/exec/obidos/ASIN/2503500331/171-9944786-8511424 unifying Aramaic, Square Hebrew and Phoenician in its initial transliteration table and illustrating the 22 letters with Square Hebrew glyphs, etc. This may have been due to technical reasons (easy availability of Square Hebrew fonts), but it looks like Punic scholars are able to read Square Hebrew fonts. P. A.
Re: [OT] What is Langues'O
From: John Cowan [EMAIL PROTECTED] Philippe Verdy scripsit: Please go to Langues'O for this commentary. As I wrote, you will be probably answered with the historical context. C'est quoi Langues'O ? Où est-ce ? Please check http://www.inalco.fr/ As the splash page shows it is « Langues O' ». Merci P. .A
Re: [OT] What is Langues'O
Philippe Verdy a écrit : Please check http://www.inalco.fr/ As the splash page shows it is « Langues O' ». Yes but only on the splash screen. Elsewhere on the site (the top banner, and menu, and the logos in PDFs of its brochures, letters and publications) it uses Langues'O which means Langues Orientales I know. (so the quote should be after rather than before, So, a typo from the Webmaster and the splash screen is indeed correct. and this site is not clear about its own logo)... The name « Publications Langues O' » refers to the publisher name and is distinct from the community name or the newsletter title. Is it really important ? P. A.
Re: ISO 15924 French name Gotique: a typo...???
Philippe Verdy a écrit : To find proof that gotique is incorrect in French, I looked for some official French resources, notably the list of language names published and used by the BPI: http://www.culture.gouv.fr/culture/dglf/bpi/list-langues.html clicking in the allemand language name gives this: http://www.culture.gouv.fr/culture/dglf/bpi/allemand.html [quote] Depuis 1941, l'allemand a abandonné l'écriture gothique. [/quote] However I wonder if this is related to the Sütterlin script. [PA] Gothique = fraktur (in fact a type of Fraktur ou écriture brisée), gotique = script of Goths. So may be a beter name would be ancien gothique. [PA] No, I don't think so. If I remember properly, for the Robert Dictionary (very standard work), gotique is the language of Old Goths. This is not a typo. I'm travelling and don't have my dictionaries with me so I can't copy the definition but here is a reference I have : « /Le *gotique* (...) est antérieur de plusieurs siècles aux autres dialectes germaniques /(SAUSSURE, /Linguistique générale., /1916, p. 297). » In the Académie française dictionarie since 1718 « parfois /gotique, /en parlant du peuple ou de la langue ». Gotique was chosen over gothique because gothique is the usual term for Fraktur in daily speech and ancien gothique makes you think of an old Fraktur style. No need of corrections (used in 10646 in any case), however éthiopique is completely unknown in French except as the French name of a famous Greek classical book (Les Éthiopiques d'Héliodore). Could we have these discussions somewhere else (in French ?). Merci. P. A.
Re: ISO 15924 draft fixes
Antoine Leca a crit : The French name for Hang looks strange. It happened to be hangul (hangul, hangeul) (after quite a bit of discussion.) The name in ISO/CEI 10646 (F) is hangl from a Corean dictionary and a Corean grammar published by the Inalco (Langues O'). Another suggested form in some sources, to appromixate the pronounciation. is hangueul P. A.
Re: Response to Everson Phoenician and why June 7?
James Kass a crit : Ernest Cline wrote, In order for Phoenician to be disunified from Hebrew, it must first have been unified with Hebrew. This is not the case. Well then, nonunification if you wish to be picky about it. Sorry if I offended. Many on this list have referred to the current proposal as a disunification and seem to be arguing that accepting this proposal would change and disrupt current Unicoding practices. In this case, I think it's important to be picky because there are no current Unicoding practices for Phoenician. You may mean that the Unicode book does not document how Phoenician (or Paleo-Hebrew) may be encoded. This is not to say that no one is using Unicode to encode Paleo-Hebrew texts. P. A.
Re: Compatibility equivalents, was: Qamats Qatan
Peter Kirk a écrit : Well, at least façade and facade collate together at the top level, with the default collation weights, and so one will match the other in simple searches. [PA] I was simply trying to say -- not that I always express myself well -- that adding some characters may force additional processing (here in the collation, elsewhere if a cedilla exists as a combining character in normalisations and rendering). Adding characters is not as innocent a process as some seem to say : «We just add characters and that's it, you are not forced to do anything about it». If it is true that one is not forced to use them as a writer in the script, when one does not control the writers or sources and one has to process several sources (collate, render, search them), one is then forced to implement certain additional processes (for excellent reasons if the characters are indeed necessary). This is why I believe one must carefully review the pros and cons before adding new characters, they may well be unified with existing ones, for example. Again, if the separate Punic script were to be compatibility equivalent to Phoenician or Hebrew I would not have strong objections; but otherwise I am sure that there would be strong objections on the grounds that yet further splitting of what is logically the same script used for closely related languages leads to even more confusion. [PA] I would have like Michael to say that splitting may lead to confusion with little gain..since he suggested ths unification. Note that I believe unification of Neo-punic with Phoenician is the prudent course to take (for the reasons I explained : introducing new characters has a cost and does force people to do something about them). Otherwise, if Unicode has space, tailoring collations is The Proper Thing To Do and «Unicode doesn't force people to do anything. Unicode makes characters available for those who wish to use them. », why not encode Neo-Punic ? After all, one could make a case for it : Neo-punic is a remote descendant from Canaanite (genealogically as much as the Aramaic-Square Hebrew branch, it also retains the 22 primitive Canaanite characters), pretty different as far as glyphs are concerned (some simple strokes may represent a b, a d or an r, a Saint-Andrew's cross may represent m or alef), has three subcategories (Carthago, Tripolitaine and Maghrebine), some inscriptions (cf. Cherchell) are mixed Neo-punic and Punic (how would one represent them in plain text?), it uses matres lectionis (reusing gutturals having nearly completely disappeared in the spoken language), etc. P. .A
Re: Qamats Qatan (was Majority of community important, inclusion not forcing people to do anything)
Jony Rosenne a écrit : -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Andries Sent: Friday, May 14, 2004 11:16 PM To: Michael Everson Cc: [EMAIL PROTECTED] Subject: Majority of community important, inclusion not forcing people to do anything (Re: [BULK] - Re: Interleaved collation of related scripts) ... Unicode doesn't force people to do anything. (Well, apart from using smart font technology for a lot of scripts, but that's not relevant here.) Unicode makes characters available for those who wish to use them. [PA] Surely Unicode does not make all characters available : it rejects some and unifies some. Why reject or unify if their inclusion would not pose a problem ? I somehow have the impression that the sheer presence of characters (duplicates for instance) does have an effect on users and forces certain processing (normalisation sometimes, decomposition in some cases, changing transcoding filters in other cases (what are the Coptic users having Coptic texts encoded as Greek data going to do?), changing/adding Cmap for some fonts (Coptic ones previously indexed with Greek code points ?)), etc. to achieve the desired effect. P. A. Having Qamats Qatan as a regular Unicode character will have an effect on the majority of users who do not know or care for the distinction. If anything, it should be some kind of glyph variant. [PA2] I suspect you are going to an answer to the effect that you are not anymore forced to use Qamats Qatan in Hebrew than you are to use the cedilla in English for « façade». But, while this is true, if you compare a Unicode script that used to not include ç or a combining cedilla with the new one that now includes it, this has an effect on algorithms (searching, transcoding, normalisation, even fonts for instance) and in this sense Unicode forces people do something about it (not that it is bad to have this ripple effect). If adding new scripts does not force one to use them, « Unicode doesn't force people to do anything » and space is not an issue, why not include new Punic and Neo-punic scripts along the proposed Phoneician ? After all, I may want to show the diachronic evolution of Phoenician (Semitic) words (from 1200 BC to 200 AD for instance) in plain text (XML). Why unify Phoenician with Punic and Neo-Punic ? No one will be forced to use Punic and Neo-Punic after all. Surely there must be a reason why you proposed a unification (and it may make perfect sense). Is it only for genealogical reasons or because the non consulted community of Punic users (which probably is any case too conservative in the eyes of some) did request unification ? P. A.
Re: interleaved ordering (was RE: Phoenician)
[EMAIL PROTECTED] a crit : Dean A. Snyder wrote, The issue is not what we CAN do; the issue is what will we be FORCED to do that already happens right now by default in operating systems, Google, databases, etc. without any end user fiddling? That's the question. Since search engines like Google survive based on their ability to serve users' wants and find what users seek, why wouldn't Google make such a tailoring? Because the Phoenician user community is very very small ? Same goes for Microsoft on some collations already mentioned (French Canadian sorting, Khmer) and those are much larger communities. P. A.
Re: [BULK] - Re: Interleaved collation of related scripts
[EMAIL PROTECTED] a écrit : Peter Kirk scripsit: Well, I accepted somewhat reluctantly that Phoenician should be separately encoded because a small number of users want it to be, although a majority apparently do not want it to be. Neither you nor anyone else knows what the majority wants, because most interested parties have never even heard of this debate. It's natural to suppose that The Majority R Us, but there's no evidence for it. In any case, it's the majority in the UTC (and ultimately the Consortium) that matters, and the UTC works mostly by consensus anyway. There is such a thing as ISO JTC1/SC2/WG2. P. A.
Majority of community important, inclusion not forcing people to do anything (Re: [BULK] - Re: Interleaved collation of related scripts)
Michael Everson a écrit : At 12:08 -0700 2004-05-14, Peter Kirk wrote: ell, I accepted somewhat reluctantly that Phoenician should be separately encoded because a small number of users want it to be, although a majority apparently do not want it to be. I really don't know if those who spoke for the majority were really representative of a real majority. [PA] Is representing the majority of a community of users important ? If so, how do we know what this majority thinks ? Or, as was mentioned, these users are sometimes too conservative and then don't really know what is good for their own good in terms of script analysis and their preferences should be ignored ? This would not be an acceptable position if Unicode intended to force all users of Phoenician to move immediately to the new script - although it would actually make much more sense to do so. Unicode doesn't force people to do anything. (Well, apart from using smart font technology for a lot of scripts, but that's not relevant here.) Unicode makes characters available for those who wish to use them. [PA] Surely Unicode does not make all characters available : it rejects some and unifies some. Why reject or unify if their inclusion would not pose a problem ? I somehow have the impression that the sheer presence of characters (duplicates for instance) does have an effect on users and forces certain processing (normalisation sometimes, decomposition in some cases, changing transcoding filters in other cases (what are the Coptic users having Coptic texts encoded as Greek data going to do?), changing/adding Cmap for some fonts (Coptic ones previously indexed with Greek code points ?)), etc. to achieve the desired effect. P. A.
Re: interleaved ordering (was RE: Phoenician)
Kenneth Whistler a écrit : [on slow implementation of some collations by certain manufacturers and service providers] And the answer is to democratize the approach. I agree on the ideal solution, it has independently been mentioned to some large manufacturer's technical respresentative who seems also to agree on this, but he is not the decision maker. One shouldn't be demanding that The Borg centrally define and implement all uses for all users, so that users simply dial Channel 621 and then sit there passively assimilating and get dished up their content. Instead, the users should demand of The Borg that user-definable requirements be supported actively, so that the *people* get to define what they do and how it is done at the point they interact with the software. It has actively been requested (for Canada for a few years and even prospectfully for Tifinagh), it is a slow moving boat and I'm not sure all manufacturers and service providers can be convinced, some of them holding a virtual monopoly in the OS market or the search engine one. Though I must admit I don't quite see what they would relinquish or lose by allowing users to tailor collations. P. A.
Re: Coptic/Greek (Re: Phoenician)
[EMAIL PROTECTED] a écrit : Peter Kirk scripsit: I support Coptic disunification on the grounds that it was requested by the user community. Initially I opposed Phoenician disunification because there was no evidence of demand for it from users. As such evidence has now been produced, I now support Phoenician disunification, according to Michael Everson's proposal. Please note carefully this last sentence. Okay, I have no qualms with that. Note that the same rules applies in both of these cases « because requested by the user community ». I appreciate your expliciting your reasons. P. A. (Incidentally, how much input does one need before one can say what the user community wishes ?)
Re: Coptic/Greek (Re: Phoenician)
D. Starner a crit : Doug Ewell [EMAIL PROTECTED] writes: Peter Kirk peterkirk at qaya dot org wrote: Because each such case has to be judged on its individual merits, according to proper justification and user requirements. There can be no hard rules like always split or always join. Nobody, neither Michael nor anyone else, ever advocates such a rule. But that's what Patrick implied when he asked how you support the Hebrew/Phoencian unification and the Coptic/Greek unification, that such a rule exists. Well, yes. But more specifically why was the unification ill-advised for Peter Kirk in the case of Coptic and would not be in the case of Phoenician. Unless, of course, one justs follows the trend and says Coptic unification was ill-avised because it has been disunified. Somehow, I feel I should not have asked since the argument often seems to be, in the case of neighbouring historical scripts, genealogy and user community feeling (as interpreted by the proposers). P. A.
Coptic/Greek (Re: Phoenician)
Peter Kirk a crit : And these two cases are hardly a good advertisement for the expert's reputation. The Coptic/Greek unification proved to be ill-advised and is being undone. I'm rather surprised by this comment. If the Coptic/Greek unification proved to be ill-advised how could you defend what I see, if I recall properly, as your (original ?) position : Phoenician/Hebrew unification ? P. A.
Script vs Writing System
At 12:12 -0700 2004-05-10, Mike Ayers wrote: But all this leads me to finally ask: what does script mean? It seems clear to me that although the term has been used throughout the Phoenician debate, not everyone is using it the same way. I know that there is a definition of script that is used for encoding purposes, but can I find it written anywhere, or is it more of an ephemeral thing? [PA] The glossary has « A collection of symbols used to represent textual information in one or more writing systems. » Chapter 6 also defines Writing Systems summarized by Table 6-1 Typology of Scripts (Writing Systems then Scripts) : A writing system is then defined as « A set of rules for using one or more scripts to write a particular language. Examples include the American English writing System, the British English writing system, the French writing system, and the Japanese writing system. » Writing System TypeUnicode Script(s) -- « Alphabets: Latin, Greek, Cyrillic, Armenian, Thaana, Georgian, Ogham, Runic, Mongolian, Old Italic, Gothic, Ugaritic, Deseret, Shavian, Osmanya Abjads:Hebrew, Arabic, Syriac Abugidas: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Tibetan, Myanmar, Tagalog, Hanunóo, Buhid, Tagbanwa, Khmer, Limbu, Tai Le Logosyllabaries: Han Simple Syllabaries: Cherokee, Hiragana, Katakana, Bopomofo, Yi, Linear B, Cypriot Featural Syllabaries: Ethiopic, Canadian Aboriginal Syllabics, Hangul » Note : «Table 6-1 lists all of the scripts currently encoded in the Unicode Standard, showing the writing system type for each. The list is an approximate guide, rather than a definitive classification, because of the mix of features seen in many scripts. The writing systems for some languages may be quite complex, mixing more than one writing system together in a composite system. Japanese is the best example; it mixes a logosyllabary (Han), two syllabaries (Hiragana and Katakana), and one alphabet (Latin, for romaji).»
Re: Phoenician
Peter Jacobi a écrit : Patrick Andries [EMAIL PROTECTED] wrote: [on tailored collations] [PA] I suppose this would be true in principle, but how long before this is implemented in the **actual tools** used by user such as MS Word or MS SQL Server ? [...] (yes, I know with a bit of tailoring ($) other tools from other manufacturers could fit the bill). When you get get it from other sources, why lamenting on the non-availability from Microsoft? [PA] Not only Microsoft, I'm not sure that your average XML editor (XMetal, for instance) will allow this for some time (search undiscriminately a string you don't know is in Paleo-Hebrew or Square Hebrew) There is enough software which offloads collation to IBM ICU, where adding tailorings is very easy. [PA] In principle yes, but this is still tailoring. A modest contribution to the Firebird Foundation or any decent programmer working on this OSS SQL database, will give you any collation for Firebird and Interbase. [PA] Maybe, but I fear this is not really practical in many cases : users may already have made a technological choice and that choice often will not allow you to tailor your collation but there is already a solution that would allow to do your work (unify, tag, use a stylehsheet) and the proposed block is commercially marginal and there is little hope tools will accommodate the new block for sometime. The argument 'we can't go this way, because Microsoft doesn't support it' is rather the wrong way around. [PA] Well, it reflects real problems and begs the question : what do you gain with desunification and introducing an additional block, this introduction having a practical impact. And it's even not engraved in stone, that Microsoft won't support it. [PA] This is true but this may take (some very long ?) time if the non-availabilibity Khmer or French Canadian sorting is anything to go by. Again, I'm not opposed to Phoenician in principle (it is intellectually pleasing and cleaner), I just don't know what you gain with this encoding that you would not be able to do today (right now, with no additional cost) using what Dean Snyder proposed (XML tags and a stylesheet for rendering) especially for the large bases where Paleo-Hebrew is mixed with Square Hebrew. Not very clear to me (this may have been explained in other emails, I will read them, apologies if the pragmatic gain has been explained and I'm just appearing a bit dumb here). Kind regards, P. A,
Re: New contribution
Doug Ewell a crit : It's clear to me that the reason my colleague and I can read this font is not that we have any special knowledge of both scripts, but because it's a stylistic variant of Latin. And thus he cannot read a Vietnamese text in Stterlin, as you said, because it is not a stylistic variant of Latin ? P. A.
Re: Phoenician
Dean Snyder a écrit : Of course. But that does not make tagged text a minefield - in the absence of your nice Phoenician font Hebrew would show up instead - precisely what is used by and large by Semiticists right now. [PA] I also got this feedback from Lionel Galand (of Tifinagh and Libyan fame) about Punic : «Je peux vous dire que j'ai souvent travaillé sur des répertoires de documents puniques qui étaient publiés en caractères hébraïques. » P. A.
Re: Phoenician
Peter Constable a écrit : [PA] I also got this feedback from Lionel Galand (of Tifinagh and Libyan fame) about Punic : «Je peux vous dire que j'ai souvent travaillé sur des répertoires de documents puniques qui étaient publiés en caractères hébraïques. » This could be multiplied a hundredfold. The same could be said of Devanagari or Arabic text published in Roman transcription. That does not mean that we do not encode Devanagari or Arabic, or that encoding those scripts prevents the same people from continuing to publish in Roman transcription. [PA] True. Just stating it is a common practice. People will not be unsettled by a plain text unification. Personally, I'm still not very convinced there is anything to be gained by having two ways of encoding large documents bases as the Dead Sea Scrolls. I would have encoded these texts as Dean Snyder suggested (my CSS/XSLT bias I supposed) : one underlying encoding, different rendering. But I'm no specialist in Semitic (or otherwise Indo-European for that matter) studies. Just an inkling, not a dogmatic conviction. P. A.
Re: Phoenician
[EMAIL PROTECTED] a écrit : Jony Rosenne scripsit: A possible strong negative argument would be if having it would cause problems for those who do not think they need it. For example, if it would make searching more difficult. This argument has been raised, but I am not convinced the possible difficulties are significant. This could be solved by making Phoenician and Hebrew base characters equivalent at the first level of collation. [PA] I suppose this would be true in principle, but how long before this is implemented in the **actual tools** used by user such as MS Word or MS SQL Server ? I think we have already discussed this here regarding the French Canadian official sorting and Khmer sorting, which are still unavailable on Windows. How much money for Microsoft in Phoenician sorting ? I'm not aware the collation tables can be tailored by users in those tools (yes, I know with a bit of tailoring ($) other tools from other manufacturers could fit the bill). In theory, I believe either way (a separate encoding or a unification within the Hebrew block(*)) could be feasible. In practice, the unification point of views is available right now. I suppose it depends on one's outlook and preference. But is Unicode concerned with current limitations ? Okay, another way : But is Unicode concerned with current pragmatic usability ? P. A.
Re: New contribution
Doug Ewell a crit : As I've said before, I don't know enough about the historical relationship between Phoenician and Hebrew to get involved in this bloodbath. But for the life of me, I can't figure out how Fraktur keeps getting dragged into it. For heaven's sake, it's not THAT unrecognizably different from Antiqua. Fraktur is not that different, this is true. One could easily write Greek texts in Coptic and they would be legible (they would obviously not use the original Coptic letters for the original Coptic sounds). Since the gauntlet had been thrown down, I did go ahead and format some Vietnamese text samples in Fraktur or Stterlin, and showed the samples to a Vietnamese co-worker who moved to the U.S. sometime after high school. He had absolutely no problem reading the Fraktur, and said there are plenty of examples of Fraktur in Vietnam (mostly decorative, or in documents from the 1950s and earlier). Which could maybe only show that he knows both scripts (Latin and Fraktur)... He couldn't understand the Stterlin at all, but did recognize it as handwriting and not, say, a secret code or child's doodling. Yes, you are right Stterlin is that different. Even if, with a little bit of Fraktur training and knowing the language of the text written in it, the text would become legible by guessing the letters that are too different. But I am not sure this (guessing the unknown forms) would not be true with a text written in a different but neighbouring script. But I understand this would not even be possible by modern day (Square) Hebrew readers when confronted with Paleo-Hebrew. Which seems to settle the script identity question for me. P. .A
v and u positional variants (Re: New contribution)
Jim Allan a écrit : Similarly _v_ and _u_ were for long only used as positional variants. For very long, which explains for example why French has a non etymological h in « huile » (oil) : to distinguish vile (she-bad) and vile (oil) written the same way but pronounced differently when the h was added. Catach is her Dictionnaire historique de l'orthographe française names this a diacritic h. It appeared around the XIIIth century. P. A. The same is true for huit (8) / vit (he lives or virile member) , huitre (oyster) / vitre (window pane), huis (door) / vis (you (sing.) live, live ! or screw), etc.
Süterrlin (was A New Contribution)
Peter Kirk a écrit : OK, maybe not such a good example. So let's go back to Suetterlin. I would expect a much higher rate of recognition among German users of normal Latin script than among American users of normal Latin script. So a test of recognition in America might seem to indicate that Suetterlin should be disunified from Latin, on the same grounds that you want to disunify Phoenician and Hebrew (plus that Suetterlin has different cursive joining behaviour, just as Syriac does from Hebrew), but a test in Germany might provide evidence against this disunification. [PA] Why is it important to go to Germany or even that one should understand the underlying text ? Germans understanding a text written in Sütterlin may only prove that some have been exposed to other scripts (Fraktur or Sütterlin for instance, if one says those aren't other scripts we are just having a cicrular argument) or that they are guessing and filling up the gaps (the very different letters) because they are interpreting the text, not that the characters are recognizably Latin characters. P. A.
Re: New contribution
Patrick Andries a crit : Mark E. Shoulson a crit : Well, it doesn't need to be a wedding invitation, does it? I'll give it a try; I've downloaded a Stterlin font, and I'll type up a small document and see if I can get some English-readers to read it or recognize it. Even if they can't read it, I'll bet they can recognize it as Latin letters and possibly English, which was not so for Paleo-Hebrew and Hebrew. Not at all obvious to me : http://www.cooptel.qc.ca/~pandries/suetterlin.jpg (sorry already mentioned) Could just as well be some Cyrillic or foreign (Tolkien ?) cursive for the average reader. But I agree -- as you mention in another message -- that people will not think this is a set of random symbols and would know how to turn the piece of paper on which it is written, mostly because of the cursivity and linking of the letters and the presence of numerals. Still, I believe this will not be perceived as the same script as Latin by readers of the Latin script (I'm not even sure young Germans would be able to recognize it without training). P. A. (who will also stop on this subject since we seem to be rehashing the same arguments) (someone asked for a Phoenician / Hebrew dictionary sample to prove the need of plaintext distinctions, I have not found one but would it be more convincing that this ? http://www.cooptel.qc.ca/~pandries/dico-fraktur-latin.pdf)
Re: Yoruba Keyboard
John Hudson a crit : For details, see http://www.bisharat.net/ and, for mailing list subscription, http://lists.kabissa.org/mailman/listinfo/a12n-collaboration If you are more at ease with French (yorouba ?), there is a Unicode-Afrique mailing list. To subscribe send a message to [EMAIL PROTECTED] Also an initiative of http://www.bisharat.net/A12N/ (Don Osborne) P. A. - o - O - o - ISO 10646 et Unicode en franais http://pages.infinit.net/hapax
Re: Pal(a)eo-Hebrew and Square Hebrew
Dean Snyder a écrit : Patrick Andries wrote at 8:55 AM on Monday, May 3, 2004: I got this answer from a forum dedicated to Ancient Hebrew : « Very few people can read let alone recognize the paleo Hebrew font. Most modern Hebrew readers are not even aware that Hebrew was once written in the paleo Hebrew script. The same could be said for archaic Greek versus modern Greek - do you propose to encode archaic Greek separately? [PA] I'm proposing nothing here, I'm just forwarding an answer, When the text was written in the paleo Hebrew four of the Hebrew letters were used as vowels - aleph, hey, vav and yud, but were removed from the text when the masorites added the vowel pointings. This is evident in the Dead Sea Scrolls where the four letters are found in the words but removed in the Masoretic text. This is simply not true. [PA] So there were Dead Sea Scrolls written in Square Hebrew with matres lectionis ? (I don't know, I just would like to know.) P.A.
[Fwd: Re: New contribution]
03/05/2004 05:19, Michael Everson wrote: Suetterlin. Oh shut UP about Sütterlin already. I don't know where you guys come up with this stuff. Sütterlin is a kind of stylized handwriting based on Fraktur letterforms and ductus. It is hard to read. It is not hard to learn, ... Since when is this an argument ? Neither is Phoenician hard to learn (22 letters with no contextual variants, etc.)... Could we please remain courteous ? ... and it is not hard to see the relationship between its forms and Fraktur. ... The relationship is not at all apparent to someone that reads only the Latin Script and does not know the genealogy from the Fraktur Script to the German Script (as Sütterlin was also called). (I like mentioning that people saw them as different scripts.) Quite analogous to a set of historically related Northern Semitic scripts, and obviously if you have learned the genealogy of these scripts it is easy to recognize the relationship... P. A.
Re: Pal(a)eo-Hebrew and Square Hebrew
Peter Kirk a écrit : On 03/05/2004 05:55, Patrick Andries wrote: Quoted... ... When the Biblical text is written in paleo Hebrew there are no vowel pointings. When the text was written in the paleo Hebrew four of the Hebrew letters were used as vowels - aleph, hey, vav and yud, but were removed from the text when the masorites added the vowel pointings. This is evident in the Dead Sea Scrolls where the four letters are found in the words but removed in the Masoretic text. No.
Re: New contribution
Christian Cooke a écrit : Surely a cipher is by definition after the event, i.e. there must be the parent script before the child. Does it not follow that, by John's reasoning, if one is no more than a cipher of the other then it is Hebrew that is the cipher and so the only way Phoenician and Hebrew can be unified (a suggestion you'll have to assume is suitably showered with smileys :-) is for the latter to be deprecated and the former encoded as the /real/ parent script? What is so important about genealogy ? P. A. (immunity of the ill-informed also requested)
Re: New contribution
Patrick Andries a écrit : Christian Cooke a écrit : Surely a cipher is by definition after the event, i.e. there must be the parent script before the child. Does it not follow that, by John's reasoning, if one is no more than a cipher of the other then it is Hebrew that is the cipher and so the only way Phoenician and Hebrew can be unified (a suggestion you'll have to assume is suitably showered with smileys :-) is for the latter to be deprecated and the former encoded as the /real/ parent script? What is so important about genealogy ? Let me precise this : what is so important whether we encode the father or one of the sons ?
Pal(a)eo-Hebrew and Square Hebrew
I got this answer from a forum dedicated to Ancient Hebrew : Very few people can read let alone recognize the paleo Hebrew font. Most modern Hebrew readers are not even aware that Hebrew was once written in the paleo Hebrew script. There are also many who believe that the square script is the original script and the paleo was a kind of handwritten script used by the commoners and was formed out of the original square script. This of course goes against the archeological record as the square script does not appear until around 500 BCE in Babylonia where it was used to write the Aramaic language and adopted by the Hebrews while in captivity in Babylon. I am not aware of a program that will switch from square to paleo although there is a site that has the Torah in paleo Hebrew script - http://www.crowndiamond.org/cd/torah.html. When the Biblical text is written in paleo Hebrew there are no vowel pointings. When the text was written in the paleo Hebrew four of the Hebrew letters were used as vowels - aleph, hey, vav and yud, but were removed from the text when the masorites added the vowel pointings. This is evident in the Dead Sea Scrolls where the four letters are found in the words but removed in the Masoretic text. I do not know of a paleo Hebrew font used in the unicode though I heard of one who was working on that awhile ago but, I do not know what came about out of that. P. A.
Re: New contribution
Michael Everson a écrit : At 08:56 -0400 2004-05-03, John Cowan wrote: Michael Everson scripsit: You can buy books to teach you how to learn Sütterlin. Germans who don't read Sütterlin recognize it as what it is -- a hard-to-read way that everyone used to write German not so long ago. Sure. At some point, the same was true of Palaeo-Hebrew and Square Hebrew, no doubt. Jews returning from Babylonian exile with their nifty new Aramaic-style glyphs probably saw PH inscriptions around them here and there. And REJECTED them as being a different script. What does this mean ? How do you know how they felt ? Any differently from the Germans that rejected Suetterlin as different script, etc. ? While I'm rather for the Phoenician proposal, I believe one has to stress structural differences and objective arguments rather than simply repeating « it's a different script ». In this regard the treatment of matres lectionis found in Paleo-Hebrew (if I'm to believe *Jeff A. Benner* http://www.ancient-hebrew.org/jeffbenner(*) which I quoted in another message) and the massoretic points in Square Hebrew may be a structural difference. P. A. (*) http://www.ancient-hebrew.org/bookstore/101.html
Re: New contribution
D. Starner a écrit : Phoenician script, on the other hand, is so different that its use renders a ritual scroll unclean. And I've got Latin fonts, whose use will render a Bible unclean. (Might come in handy for Tantric religious works, though.) More seriously, I imagine some German religious communities were very strict on the Bible in Fraktur instead of a radical new Roman font. [PA] It is true of some Amish and Hutterite communities that have asked explicit for Fraktur to be used in Hymn books and not Latin (I know of a request to this effectmade to a Mennonite printer in Manitoba known to me ). P. A.
Re: Arid Canaanite Wasteland (was: Re: New contribution)
Elliotte Rusty Harold a écrit : At 9:43 AM -0700 5/1/04, Peter Kirk wrote: For the record, I agree that Old Canaanite would be a better name. The reason for this is not primarily to be more Semito-centric, but rather to represent better the range of languages covered. For the same reason, Latin script should not be called English script, because English is only one of many languages using it. Of course, Latin is also only one of many languages using the Latin script. Of course, the name Latin also has the nice political property that it's nobody's first language and only one very unusual state's official language any more (Vatican City). But is there some reason we call this the Latin script instead of the Roman script? Roman Script to me is opposed to Latin Script, Uncial Script, Fraktur Script (all seen as scripts by Daniels Bright). P. A.
Re: New contribution
Ernest Cline a écrit : [Original Message] From: John Hudson [EMAIL PROTECTED] But your proposal specifically states that the 'Phoenician' characters should be used to encode Palaeo-Hebrew, as if somehow Hebrew and Hebrew are different languages when they look different. No more so than Japanese becomes a different language when written as romanji. Language and script are distinct and a given language is often encoded using several different scripts. There may be points against favoring writing Paleo-Hebrew with a Phoenician script instead of the Hebrew script, but this isn't one of them. Well, since this seems to be the center of some controversy, isn't the methodology one should adopt to ask what the community of users thinks : is this for you (plural) two different scripts or are those just stylistic variations of the same script (Hebrew). The community of users. And then to record this as an encoding guideline in the proposal (Paleo-Hebrew texts should be encoded using Phoenician codepoints or for Paleo-Hebrew texts texts should be encoded using the Hebrew codepoints). I don't really know, I just wish we could reconcile both sides here ;-) (*) P. A. (*) I must be affected by the gorgeous weather we are at long last enjoying here.
Re: New contribution
Ernest Cline a écrit : How about the following: When deciding how to encode ancient scripts in Unicode, sometimes arbitrary distinctions must be made between scripts that had a continuous evolution from one form into another. Depending upon the point of view of the author, a text written in a transitional form, such as Paleo-Hebrew, might be encoded in Unicode as either of the two scripts that it serves as a bridge between, in this case, Phoenician and Hebrew. Depending upon how the passions run, this might mollify both sides or it might make them both madder than they are. :) [PA] I think this may only create confusion where there is none right now (if it is true data are coded with Hebrew code points and a font change does the trick). A standard should attempt to standardize and improve things. P. A.
Re: U+0140
Anto'nio Martins-Tuva'lkin a écrit : However I advise removal of the note Catalan under U+0140 and U+013F, and perhaps replacement of the whole note with «for Catalan use U+006C U+00B7» (resp. U+004C). Did you get an answer on this ? Why is there no decomposition associated to this character ? Also did somewhat mention why U+0140 is even in Unicode since it could be considered (by ignorami like me) as a precomposed character (l + middle dot) ? Is it due to the polysemy of the middle dot ? P. .A
Re: U+0140
Patrick Andries a écrit : Anto'nio Martins-Tuva'lkin a écrit : However I advise removal of the note Catalan under U+0140 and U+013F, and perhaps replacement of the whole note with «for Catalan use U+006C U+00B7» (resp. U+004C). Did you get an answer on this ? Why is there no decomposition associated to this character ? Also did somewhat mention why U+0140 is even in Unicode since it could be considered (by ignorami like me) as a precomposed character (l + middle dot) ? Is it due to the polysemy of the middle dot ? [PA] In the meantime Eric Muller forwarded some answers (dating back from 6/8/2002) where Ken explains this all. Thank you Eric. « There is no particular reason to use the l· as a single character, when all the 8859-based and Windows 1252 implementations would be using U+00B7 for the middle dot. Consider U+0140 as effectively a compatibility character for ISO 6937. It is mapped to 0xF7 in that standard. It is also mapped to 0xA9A8 in Code Page 949 (Korean) -- which probably got it from ISO 6937 in the first place. Is U+0140 used in other languages? Not that I know of. --Ken » Patrick
Re: U+0140
Philippe Verdy a écrit : From: Patrick Andries [EMAIL PROTECTED] Anto'nio Martins-Tuva'lkin a écrit : However I advise removal of the note Catalan under U+0140 and U+013F, and perhaps replacement of the whole note with «for Catalan use U+006C U+00B7» (resp. U+004C). Did you get an answer on this ? Why is there no decomposition associated to this character ? Also did somewhat mention why U+0140 is even in Unicode since it could be considered (by ignorami like me) as a precomposed character (l + middle dot) ? Is it due to the polysemy of the middle dot ? I thought it was already answered in this list by a Catalan speaking contributor: the sequence L+middle-dot in Catalan is NOT a combining sequence. Are you referring to the person I quoted ? Why doesn't the U+0140 have decomposition in Unicode ? P. A.
Re: U+0140
Kenneth Whistler a écrit : Did you get an answer on this ? Why is there no decomposition associated to this character ? Thanks to Eric and Patrick for digging out my answer on this perennial question from a couple years back, and saving me the trouble of having to rummage around to find it. :-) Also, it should be noted that there *is* a decomposition for U+0140 in the Unicode Character Database, to wit: 0140;LATIN SMALL LETTER L WITH MIDDLE DOT;Ll;0;L;compat 006C 00B7;... ^^ Oops. Looked at the wrong place in BabelMap. Sorry (blushing). Patrick
Re: U+0140
Philippe Verdy a écrit : From: Patrick Andries [EMAIL PROTECTED] Peter Kirk a écrit : What is U+2027 intended for? The name suggests that it might be what is needed for Catalan. [PA] Isn't this the one that should be used in dictionaries ? See http://www.unicode.org/unicode/standard/reports/tr14/tr14-6.html 2027 HYPHENATION POINT Hyphenation point is primarily used to visibly indicate syllabification of words. Syllable breaks are potential line breaking opportunities in the middle of words. The hyphenation point It is mainly used in dictionaries and similar works. When an actual line break falls inside a word containing hyphenation point characters, the hyphenation point is rendered as a regular hyphen at the end of the line. This last sentence is wrong, at least in my Larousse dictionnaries: I believe it simply describes certain practices (Anglo-Saxon, American ?), maybe this should be clearer. P. A.
Re: names of the chars?
Tobias Stamm a crit : Greetings to all standartisers! I'm new here so forgive me my stupidness. I just have one little question to which I didn't found the answer in the whole homepage: What is the standard of the characters names? * The valid English names of ISO 10646 are defined in Annex L of ISO/IEC 10646-1:2000(E) Rule 1 By convention, only Latin capital letters A to Z, space, and hyphen are used for writing the names of characters. NOTE Names of characters may also include digits 0 to 9 (provided that a digit is not the first character in a word) For more detail see http://std.dkuug.dk/JTC1/SC2/WG2/docs/principles.html * For the French names more characters are allowed, see Annexe L of ISO/IEC 10646-1:2000(F) [OE digraph, apostrophe, accented letters] Rgle 1 Par convention, on nutilisera que des lettres latines majuscules (y compris les lettres accentues et les digrammes souds), lespace, lapostrophe et le trait dunion pour la formation des noms de caractres. Ces caractres doivent faire partie du rpertoire de l'alphabet latin n 9 (ISO 8859-15). NOTE : Les noms des caractres peuvent aussi comprendre les chiffres 0 9 (en autant que le premier caractre dun nom ne soit pas un chiffre) lorsque lutilisation du nom de ce chiffre nest pas approprie. P. A.
Re: French typographic thin space (was: Fixed Width Spaces)
Asmus Freytag [EMAIL PROTECTED] a écrit : Have you folks noticed the addition of Narrow Non Break Space? Yes, but I have not been able to find a font with a narrow enough glyph (I just looked again at Code 2000). Does anyone know of an appropriate font for French in this regard ? P. A.
Re: Version(s) of Unicode supported by various versions of Microsoft Windows
Peter said: People *really shouldn't* ask Does product X support Unicode version N? They should be asking questions like Can product X correctly perform function Y on such-and-such characters added in Unicode version N? This makes for a rather long list of questions if you want to know what Microsoft supports in a new OS or product release for instance. One might think of how to best present the latest support level in a concise fashion and not on a per function per character basis. P. A.
Re: Version(s) of Unicode supported by various versions of Microsoft Windows
Peter Constable a écrit : Well, there is no way to answer a question like What version of Unicode does Windows XP support with anything other than a vague summary statement like somewhere between 3.0 and 4.0 or a bunch of details. And since people tend not to find a vague summary very useful, I'm suggesting we'd all be better off if they simply asked about what specific functionality they need to know about. At least, until somebody comes up with some bright idea about other ways to answer such questions. One other option is to ask what languages / locales are supported, and that is how MS has been documenting things up to now. It's a slightly different question, but it's one that is answerable. Much better, IO. MS must then provide a coherent support for a language/locale at a given Unicode level. (No one wants to ask how every functions works for every codepoint for that locale, at least not before hitting a bug...) P. A.
Re: [OT?] Modifying (Unicode) sorting of languages using diacritics in MS Word and MS SQL Server
Michael (michka) Kaplan a crit : From: Patrick Andries [EMAIL PROTECTED] I have the same question for MS SQL Server 2000... Similar answer to the one Chris gave for Word, though with a slightly older version of the Windows sort tables Finally, I would like to know if it is possible for a user to add an additional language to the ones appearing in the Windows regional and language options, so as to assign to it, for instance, some keyboard layouts. This is not currently possible. But the user can certainly create a new keyboard (now with an easy GUI tool) and the system will handle all that is typed with it. [PA] Yes, the GUI tool is very nice. So easy to use in theory that I don't understand why it is only available in English (i.e. one does not need to be a techie and thus know English to be able or want to use this tool). P.-S. : Do Word, SQL Server 2000 and the Regional and Language options window support all Unicode 4.0 associated languages as far as proper sorting and addition of keyboards are concerned ? It is hard to know what you mean here -- are you asking for when every single character in Unicode 4.0 will be in some keyboard and some linguistically appropriate sort, all built into Windows? Or did you have a more practical (and reasonable) target in mind? [PA] Let me be reasonable as you kindly suggest, how about proper French Canadian (CAN/CSA Z243.4.1 standard (which you most probably know) and ISO/IEC 14651 with the delta corresponding to the latter) or Khmer sorting ? P. A.
Re: [OT?] Modifying (Unicode) sorting of languages using diacritics in MS Word and MS SQL Server
Michael (michka) Kaplan a crit : [PA] Let me be reasonable as you kindly suggest, how about proper French Canadian (CAN/CSA Z243.4.1 standard (which you most probably know) and ISO/IEC 14651 with the delta corresponding to the latter) or Khmer sorting ? I am unaware of any specific non-conformant pieces in Windows in regard to the former standard. [PA] Well, may I suggest an offline discussion with Alain Labont (cc'ed) ? He is more aware of this issue than I am. I believe he has already transmitted his concern relative to this non-conformance through other channels in Microsoft (subsidiaires and other members of the Unicode consortium). P. A.
[OT?] Modifying (Unicode) sorting of languages using diacritics in MS Word and MS SQL Server
Hello, I would like to know if the collating order used by Word may be tailored by the user to sort properly letters using diacritics in a language not appearing in the list of languages by Word. A simple sort by character number will obviously not work. I have the same question for MS SQL Server 2000... Finally, I would like to know if it is possible for a user to add an additional language to the ones appearing in the Windows regional and language options, so as to assign to it, for instance, some keyboard layouts. Many thanks, Patrick Andries P.-S. : Do Word, SQL Server 2000 and the Regional and Language options window support all Unicode 4.0 associated languages as far as proper sorting and addition of keyboards are concerned ? If not, when will these products do so ?
Re: Detecting encoding in Plain text
- Message d'origine - De: John Delacour [EMAIL PROTECTED] Given any sizeable chunk of text, it ought to be possible to estimate the statistical likelihood of its being in a certain encoding/[language] even if it's in an unspecified 8859-* encoding. It would be quite an interesting exercise, but I'd be surprised if someone hasn't done it before. Perhaps someone here knows. See http://www.alis.com/fr/services_que.html http://www.alis.com/en/services_que.html P. A.
U+0488 and U+0489
Hello everyone, Does anyone have any background and usage information relative to the two characters named below ? Some rendered examples would be very much appreciated. U+0488 COMBINING CYRILLIC HUNDRED THOUSANDS SIGN U+0489 COMBINING CYRILLIC MILLIONS SIGN Many thanks, P. Andries - o - 0 - o - Meilleurs vux pour l'an nouveau !
Re: Mathematical exist and forall in Unicode
- Message d'origine - De: Mirek [EMAIL PROTECTED] Hello, I am not sure if it is the proper place to discuss the case if missing characters, but haven't found better place. I tried to find out two characters in unicode and encountered the following problem. There are two characters for logical EXISTS and FOR ALL signs. There exists old notation that is in unicode (exist = mirrored E, for all = inverted A) U+2200 U+2203 and yet new notation (exist = the character similar to logical OR OPERATOR but bigger, and for all = similar to logical AND OPERATOR, but bigger). You mean similar to U+22C0 and U+22C1 ? Do you have any reference as to the modernity of this V-like notation ? May I add that, at first sight, I find this a very strange idea since well-known and distinct signs would have been replaced by signs dangerously close to other well-known ones. IMHO it's strange that unicode does not cover both types of notations, or maybe I missed something. I don't know, but how about considering them as glyph variants ? P. A.
Looking for more samples of _| (power tower)
I have found an interesting form of a power tower (_|, see the third line here http://pages.infinit.net/hapax/images/puissances.jpg). I was wondering if anyone else knew of other occurrences of this sign? Many thanks, Patrick
Re: Mathematical exist and forall in Unicode
- Message d'origine - De: D. Starner [EMAIL PROTECTED] These can probably be used as glyph variants, i.e., by selecting a US vs. European font (or whatever is the distinction). I thought glyph variants were supposed to look at least somewhat similar. Any reference to this similarity in appearance as a condition ? (Is the Sütterlin »e« a glyph variant of standard latin «e» then ? It does not ressemble any other e I know but rather an n.) P.A.
Re: UNICODE OTHER STANDARDS
- Message d'origine - De: Markus Scherer [EMAIL PROTECTED] It looks to me like Christopher is not after an analysis of what standards could somehow be squeezed to use Unicode charsets, but rather a list of standards that _specify_ (actively, not potentially) Unicode/10646. The obvious ones are of course HTML (at least since 4.01: http://www.w3.org/TR/html401/charset.html#h-5.1) XML ECMAScript I do not have a complete list. Another one : ISO 14651 (collation), I believe. Ken Whistler (or Alain Labonté) can confirm (or deny) this. P. A.
Re: [hebrew] Re: Ancient Northwest Semitic Script (was Re: why Aramaicnow)
-Message d'origine - De: "D. Starner" [EMAIL PROTECTED] Indeed, by the same argument, we could encode a lot of scripts together. ISCII did it for Indic scripts. I'm sure we could do some serious merging among syllabic scripts - 12A8(#4776;) is the same as 13A7(#5031;) I understand this is said tongue in cheek, but even then This merging seems reasonable to you because you consider theirsimilarEnglish names, butnottheirdifferent phonetic value ([k]vs [ka]) or their ISO 10646Frenchnames for instance (respectively K for Ethiopic and KA Cherokee). KA being 12AB in the French version. See Daniels-Bright (Table 51.5 which gives k (ka) for U+12A8 [k] and ka for U+12AB [ka] or [k]) and Amharique pour francophones (L'Harmattan) (p. 5 which gives ke/k for U+12A8 and ka for U+12AB). The English names are, of course, perfectly okay (don't want to open a can of worms here;-)). P. A. - o - O - o - ISO 10646 en franais http://pages.infinit.net/hapax
Re: Aramaic unification and information retrieval
- Message d'origine - De: Patrick Andries [EMAIL PROTECTED] - Message d'origine - De: Michael Everson [EMAIL PROTECTED] At 17:46 + 2003-12-26, Christopher John Fynn wrote: (Though the Roman style Fraktur style of Latin script are probably more different from each other as some of the separately encoded Indic scripts [e.g. Kannada / Telugu]) Sorry, Chris, this is unsubstantiated speculation, and it doesn't happen to be true. In 1997, I showed some comparisons between Coptic, Greek, Cyrillic, and Gothic showing that all of them but Greek were similar enough to be read with a minimum of training and practice. Very probable, but how did you measure those distances and the training and practice necessary ? I revised this a bit in 2001: http://www.evertype.com/standards/cy/coptic.html. German, English, and Irish can all be read with similarly low learning curve whether the script is Fraktur or Gaelic; the number of letterforms which differ is small. Interesting, I wonder if you included Sütterlin in your study. http://pages.infinit.net/hapax/images/suetterlin.jpg To the average litterate reader of the Latin script and not scholars like Everson : what letters are written ? Some people having enquired about what the Sütterlin letters could correspond to (and some having mistakenly identified several), I have written the document in a different « script ». http://pages.infinit.net/hapax/images/SuetterlinEnAnglaise.jpg I wonder how many letterforms could be considered as different. If the first three words (»Bin noch munter«) are anything to go by, I would say quite a lot : B, c, h, u, t, e, r with n deceivingly close to e to the untrained eye. P. A.
Re: Aramaic unification and information retrieval
- Message d'origine - De: D. Starner [EMAIL PROTECTED] Yup, if you make a grid patten of sufficient size and complexity you can fit any relatively simple shape like a letterform into it. And this grid doesn't even particularly fit the characters. Two big rules of Latin typography are that the capital letters are all of the same size (visually, at least) Is this true for accented capitals or only for English letters? AUGJQO Did I yet again read too fast? P. A. Season's Greetings Best Wishes to All! Bonnes ftes et meilleurs vux tous !
[OT] Size of Latin Capitals (was Re: Aramaic unification and information retrieval)
- Message d'origine - De: Doug Ewell [EMAIL PROTECTED] Patrick Andries Patrick dot Andries at xcential dot com wrote: And this grid doesn't even particularly fit the characters. Two big rules of Latin typography are that the capital letters are all of the same size (visually, at least) Is this true for accented capitals or only for English letters? AUGJQO Did I yet again read too fast? Maybe. Think base letter, not letter with combining diacritics. Well, I think this is better said by the writer then implicitly thought by the reader ;-) Also bear in mind that capital J and Q have no descender in some fonts. A minority, I would think, for Q and this right from lapidary capitals and there are also some capital P and Y that extend below the baseline (Fraktur for instance, unless this is not Latin). But okay, this is not a Unicode issue but a font design issue. P. A.
Re: [OT] CJK - CJC (Re: Corea?)
- Message d'origine - De: Don Osborn [EMAIL PROTECTED] Although I admit to not quite understanding the motivation for this suggestion, Request by 22 MPs that want to modify the English spelling by law. Because according to the articles this was the original English spelling before the occupying Japanese authorities changed the initial C by a K so that Korea would follow Japan in alphabetical order. Apparently Nord and South Corea(s) want to participate in the 2004 Olympic Games under the letter C (» Sie geht so weit, dass die beiden Länder bei den Olympischen Spielen 2004 gemeinsam mit dem C im Namen antreten wollen. Überhaupt soll das Weltsportfest der eigentliche Grund für die koloniale Buchstabensuppe sein. «) P. Andries