RE: Linguistic precedence [was: (TC304.2313) AND/OR:

2000-06-16 Thread Marco . Cimarosti
Touché! I was mislead by a fictional character by V. Montalbán: Pepe Carva*lh*o a Catalan detective of Galician origins... Ciao. Marco -Original Message- From: Antoine Leca [mailto:[EMAIL PROTECTED]] Sent: Friday, 16 June, 2000 12.44 To: Unicode List Cc: [EMAIL PROTECTED]

RE: Gender symbols

2000-06-26 Thread Marco . Cimarosti
Doug Ewell wrote: I have sometimes wondered why these two useful, pre-existing symbols are not used in the U.S. to denote 'male' and 'female' on e.g. restroom doors. One possibility is that, because they are frequently associated with 'sexuality' or 'relations between the sexes,' they

RE: Looking For Information

2000-06-28 Thread Marco . Cimarosti
Harry R Aufderheide wrote: 1. Is the UTF-8's character set equal to the Latin-1 (ASCII) Code Page's? If not, what are the differences? As Brendan Murray already mentioned, UTF-8 is an encoding form of Unicode, so it supports *all* Unicode characters. In case you are wondering how this is

Off topic: again on Italian spelling (was RE: Plane 14 language t

2000-06-29 Thread Marco . Cimarosti
Antoine Leca wrote: the lowercase of Italian (or Corsican) "A'", "E'", ... at the end of a word is likely to be "à", "é/è", ... (Marco, is it really true? and how é and è are handled?) We should rather say that -A' (etc.) is a poor man's capitalization for -à (etc.). The proper capital form

RE: Looking up han characters

2000-06-29 Thread Marco . Cimarosti
Robert Lozyniak wrote: How do I look up a han character if I don't know its codepoint? What if all I have is its shape, or its EUC-JP or Shift-JIS number? There are a couple I want to see. If you know the value in JIS (or any other encoding), you just need to look up a conversion table.

RE: Mixing languages on a Web site

2000-06-30 Thread Marco . Cimarosti
Antoine Leca wrote: Hmmm. Writing from top of my head (which is *not* the good way to go in such a list), I understood that Unicode was the default character set, [...] You are right (see http://www.w3.org/International/O-HTML-charset.html). OTOH, I believe that for upward compatibility,

RE: Japanese pronunciation of hex digits?

2000-07-03 Thread Marco . Cimarosti
[EMAIL PROTECTED] wrote: How do the Japanese read the hex digits A thru F? Probably: ei, bi, shi, di, i, effu. _ Marco

RE: Furigana codes?

2000-07-06 Thread Marco . Cimarosti
Daniel Biddle wrote: On Wed, 5 Jul 2000, Rick McGowan wrote: iRck I thought this was a typo until I saw your address. U263A It's not a typo: Rick's signature has passed through an Indic renderer, so the "i" was reordered. U+FF1AU+FF0DU+FF09 _ Maco`

RE: Acronyms

2000-07-10 Thread Marco . Cimarosti
Haï, Antoine. [...] plan supplémentaire pour idogrammes CJK, [...] But is "CJK" the correct acronym for "chinois, japonais, coréen"!? Tchao. Marco

RE: What is this case folding?

2000-07-11 Thread Marco . Cimarosti
Robert Lozyniak wrote: If it is what I think it is, I don't want it in English. How could it tell "aids" from "AIDS", for instance? Or "joy" from "Joy"(name)? (C'mon, 11BB, you were supposed to know this one ;-) Case folding (or case conversion) is the process of changing letters from one

RE: FW: Unicode to UTF-8

2000-07-11 Thread Marco . Cimarosti
[BF Ax] FFx = [BF Bx] copyleft 2000 by Marco Cimarosti - * - * - * - * - * - _ Marco

RE: Not all Arabics are created equal...

2000-07-11 Thread Marco . Cimarosti
Greg Reynolds wrote: The only remedy I can see for this particular flaw in Unicode is the introduction of a codepoint to set or maybe swap the evaluation rule for number strings. It is not a flaw. Rather, IMHO, we are all doing the mistake of considering this as an *encoding* issue. Which

RE: Separate list for Arabic Extended discussions?

2000-07-17 Thread Marco . Cimarosti
N.R.Liwal wrote: I would vote that UNICODE, host mailing lists on Script level, becuase, issues discussed of CJK are not much related to Roman and Arabic. If there are several lists like: Arabic... ect. If one wish to participate in all that should be an option. but still if all are happy

CJK: Subset of Unicode to represent Japanese Kanji? (was: Thought

2000-07-17 Thread Marco . Cimarosti
Michael W. Martin For a device that will print a relatively basic label (such as sequence number, date, time, name, department, etc) onto a document in Japanese -- what is your consensus? Basic Kanji+Hiragana+Katakana or will Hiragana+Katakana or just Katakana suffice? My vote is

Unicode controls for VB (Programming)(VB)(OLE)(WinNT)

2000-07-19 Thread Marco . Cimarosti
Visual Basic (6.0 for 32-bit Windows Development, under Windows NT) is capable of handling Unicode strings, internally, but I found no way to display an arbitrary Unicode text in any of the built-in controls (buttons, text boxes, combos, etc.). Even if I set the control's font to an

RE: Designing a multilingual web site

2000-07-19 Thread Marco . Cimarosti
Munzir Taha wrote: utf-8: ef bb bf utf-16be: fe ff utf-16le: ff fe utf-32be: 00 00 fe ff utf-32le: ff fe 00 00 (check before utf-16le!) scsu: 0e fe ff (unfortunately rather rarely used) Sorry for being a dummy about this. But I can't understand where these bytes

Uniscribe API files? (Programming)(Microsoft)(Windows)

2000-07-20 Thread Marco . Cimarosti
I am looking for the header file containing the declarations for Uniscribe (USP10.DLL). It think it should be a single file named "usp10.h", but I cannot find it on the Microsoft web site or elsewhere. Could somebody point me in the right direction? Thank you. _ Marco

RE: 127 strokes beyond the radical?!

2000-07-21 Thread Marco . Cimarosti
Patrick Andries wrote: De : [EMAIL PROTECTED] On page 876, the character U+6B8B is listed as being 127 strokes beyond the radical. I'd say it's more like 6 strokes beyond the radical. I believe it to be 5 strokes and it is already listed under radical + 5 strokes. Funny: it is +6

RE: Unicode in VFAT file system

2000-07-21 Thread Marco . Cimarosti
Asmus Freytag wrote: At 09:53 AM 7/20/00 -0800, Ken Krugler wrote: 2. Is little-endian UCS-2 a valid encoding that I just don't know about? Yes, it is. Your example of the VFAT system is a near perfect case, since the details of it form what Unicode calls a 'Higher level protocol' and

RE: Unicode FAQ addendum

2000-07-21 Thread Marco . Cimarosti
1) The UTF whose bits can be counted is not the eternal UTF. The encoding that is not in UTR-17 is not a compliant encoding. UCS-2 is the origin of the BMP. UTF-16 is the origin of 1,048,576 more code points. Therefore, constantly use UTF-8 and you'll see the mystery on your mail

RE: What is Unicode in Chinese?

2000-07-25 Thread Marco . Cimarosti
Sorry for all those who are seeing the mystery above here ^ but this mail really required UTF-8. Joseph Becker wrote: It seems that Chinese is the only major language in which the term "Unicode" needs to be translated rather than transliterated. [...] We have collected these candidates so

RE: Bangla(Bengali) letter Missing

2000-07-27 Thread Marco . Cimarosti
Brendan Murray wrote: "Md Ziaur Rahman" [EMAIL PROTECTED] wrote: ... found that a letter that is frequently used in Bangla is absent from the standard. It is Bangla letter Khondo-ta I believe that this character is a composition of TA (U+09A4) and the ZERO-WIDTH JOINER, the so-called

RE: Bangla(Bengali) letter Missing

2000-07-27 Thread Marco . Cimarosti
Robert Brady wrote: On Thu, 27 Jul 2000, Abdul Malik wrote: How am I to encode the different forms in unicode? For the last three, you can do something like BENGALI LETTER WHATEVER BENGALI VIRAMA BENGALI LETTER BA for the -va form, and BENGALI LETTER WHATEVER BENGALI

RE: What is ` (U+0060) for?

2000-08-02 Thread Marco . Cimarosti
Addison wrote: Actually, I erred. It's Switzerland that prefers this formula (see the ITS and DES locales on Windows or in Java--although Java uses three digits for grouping and it should be four). The Swiss locale on Windows systems actually uses ' (U+0027) as a thousands separator, not `

RE: Euro

2000-08-07 Thread Marco . Cimarosti
Asmus Freytag wrote: The problem with the commission design of the euro glyph is that it only works as long as you use their aspect ratio and uniform stroke width. As long as you have these, the eye will complete them to a lower case 'e' form [...] Visual perception is indeed a funny

RE: Encodings for SQL Databases

2000-08-07 Thread Marco . Cimarosti
((( Sorry to those who see a mangled subject. It should read "RE: Encodings for SQL Databases" ))) Jon Peck wrote: Most of the major databases now support Unicode at some level, but what is the best way to encode SQL statements for various database access apis? [...] According to the

RE: Encodings for SQL Databases

2000-08-07 Thread Marco . Cimarosti
((( Sorry to those who see a mangled subject. It should read "RE: Encodings for SQL Databases" ))) Jon Peck wrote: Most of the major databases now support Unicode at some level, but what is the best way to encode SQL statements for various database access apis? [...] According to the

RE: Unicode String literals on various

2000-08-08 Thread Marco . Cimarosti
Antoine Leca wrote: char C_thai[] = "\u0E40\u0E02\u0E17\u0E32\u0E49\u0E1B\u0E07\u0E1C\u0E33"; Would the Unicode values be converted to the local SBCS/MBCS character set? If yes: Is the definition of this locale info part of the C99 standard itself, or is it operating system's locale? And

RE: Unicode String literals on various

2000-08-08 Thread Marco . Cimarosti
Hi, Antoine. I can continue to dissert on this subject (all of this should finally be cooked in a FAQ anyway), but I do not want to flood the list with a marginaly interesting subject. Merci beaucoup. It was very informative! Ciao. Marco P.S. You should not be so shy: up

RE: Arabic shaping behavior questions

2000-08-09 Thread Marco . Cimarosti
Bob Hallissy wrote: 1) Is the Arabic Joining Class [...] normative or informative? Like it or not, it is normative. See http://www.unicode.org/Public/UNIDATA/UnicodeCharacterDatabase.html, that reads: ... ArabicShaping.txt (Section 8.2) Basic Arabic and Syriac character shaping

RE: Braille rendering of Unicode [OT 50%]

2000-08-09 Thread Marco . Cimarosti
Michael (michka) Kaplan wrote: Is not http://www.hclrss.demon.co.uk/unicode/braille_patterns.html or alternately http://charts.unicode.org/Web/U2800.html already covering this? No. These are at most the building blocks for braille. A better parallel would be to consider these "presentation

RE: Which languages are supported in basic latin

2000-08-10 Thread Marco . Cimarosti
Halldor G. Gestsson: Can I find a list where all languages supported in the basic latin (0x-0x00FF)? [...] Wich languages uses the latin extensions A,B and C? Page http://www.eki.ee/letter/ contains the information to build your lists. _ Marco

RE: Swiss numerical format [OT]

2000-08-10 Thread Marco . Cimarosti
Jörg Knappen wrote: Are there good (authorative) references on the so called swiss numerical format with its peculiar thousand separator? Why not comparing the locale settings of main operating systems? I think that at least WinNT, Apple, Linux, and other Unixes are widely represented on this

RE: Braille rendering of Unicode [OT 50%]

2000-08-10 Thread Marco . Cimarosti
Steven R. Loomis wrote: [...] Presumably the unicode codepoints in braille would make a great format for these translations on their way to a printer. One would hope they would get such use and not simply for braille-looking characters on paper or screen. You are right, I didn't catch it:

RE: Zero-width ligator

2000-08-10 Thread Marco . Cimarosti
Roozbeh Pournader wrote: That seems problematic to me, when used for Arabic. How should one use ZWNJ between two Arabic letters to stop the ligature? The'll get disconnected! Good point. ZWJ+ZWNJ+ZWJ comes to mind, but it is really not the maximum of elegance... _ Marco

Same language, two locales (RE: Locale string for Norwegian - Bok

2000-08-31 Thread Marco . Cimarosti
Addison P. Phillips wrote: This is a weakness of the locale model used on the Web and most UNIX systems: the hierarchy is based on the ISO 639 language codes and the ISO 3166 country codes. It doesn't cover such minutiae as "inside-a-country" variation easily nor does it deal well with

RE: Same language, two locales (RE: Locale string for Norwegian -

2000-08-31 Thread Marco . Cimarosti
Addison P. Phillips wrote: Differences in writing systems are much more problematic than the Norwegian example. The Simplified/Traditional Chinese thing leaps to mind, of course, [...] Right. I just notice that, in Unicode, this is not a display difference but an encoding one: corresponding

RE: Same language, two locales (RE: Locale string for Norwegian -

2000-09-01 Thread Marco . Cimarosti
Antoine Leca joked: Neither you nor I would accept that our national language are tagged, respectively, la-ital and la-fran... ;-) Similarly, I believe Norwegians and Danes will not accept to have their present 2-letter codes replaced with cascaded ones in the form "Norse"-n? or "Norse"-da

RE: Same language, two locales

2000-09-04 Thread Marco . Cimarosti
Michael (michka) Kaplan wrote: The one irrevocable thing that LCIDs give you is a collation choice (the regional options do not allow you to specify a separate default collation choice). Another important setting that is hard-wired with Windows locale is language. This affects some standard

RE: Armenian numbers

2000-09-04 Thread Marco . Cimarosti
Elliotte Rusty Harold wrote: Is anyone here familiar with Armenian? The CSS Level 2 specification from the W3C makes reference to "Traditional Armenian numbering" but Unicode doesn't seem to include any Armenian numbers, at least as such. Is this another language like Nebrew where the

RE: [unicode] More ways to encode U+FEFF (was: Re: Designing a

2000-09-06 Thread Marco . Cimarosti
Markus Scherer wrote: of this list, only UTF-EBCDIC is a viable encoding form. the others are either deprecated, never made it beyond draft, or are unofficial discussion pieces that never made it anywhere (i proposed one of them :-). Please notice that at least one of these has never even

RE: Tamil glyphs

2000-09-07 Thread Marco . Cimarosti
Antoine Leca wrote: Michael (michka) Kaplan wrote: [...] The Monotype font and Latha in Windows 2000 are the way that my client got both display types. I believe this is a rather special need that your client have: as I understand, he wants, at the same time, some rendering forms

RE: Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion

2000-09-08 Thread Marco . Cimarosti
Title: Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion tools Sure: uniconv.exe by Basis Technology. It is distributed for free as a demo of the Rosette library; download from http://rosette.basistech.com/demo.html. The version I have(quite old) does not support UTF-16, but it

RE: Tamil glyphs

2000-09-11 Thread Marco . Cimarosti
Michael (michka) Kaplan wrote: From: "Rick McGowan" [EMAIL PROTECTED] [...] I suppose if you just want to display the non-ligature type thing in a situation where the font wants to give you the ligature type thing, you might be able to use a ZWNJ or ZWNBSP between the chars. [...]

Re: Tamil glyphs

2000-09-12 Thread Marco Cimarosti
Please ignore my previous message (subj "[EMAIL PROTECTED]", to Antoine, cc [EMAIL PROTECTED]). Sorry about that. Antoine Leca wrote: [EMAIL PROTECTED] wrote: [...] In ordinary cases, a ZW[N]J inside a consonant cluster does not prevent matra reordering. E.g., in Devanagari:

RE: surrogate terminology (was Re: Surrogate support in *ML?

2000-09-13 Thread Marco . Cimarosti
Peter constable wrote: - code values: integers within the space of some encoding form; d800 - dfff *are* code values, but not codepoints - surrogate: I'm inclined to say that this should refer *only* to a UTF-16 code value in the range d800 - dfff; equal to "surrogate code value" -

FWD: Unicode Indian languages (was Re: Tamil glyphs)

2000-09-13 Thread Marco Cimarosti
_ Marco --Original Message-- From: "mlinguist" [EMAIL PROTECTED] To: "Marco Cimarosti" [EMAIL PROTECTED] Sent: September 12, 2000 1:55:59 PM GMT Subject: Re: Tamil glyphs Dear Mr.Marco, Sorry for sending an unsolicited mail to you. I am interested in knowing alot about t

RE: Tagging orthographic systems

2000-09-14 Thread Marco . Cimarosti
Michael Everson wrote: Tire Center (US) Tire Centre (CA) Tyre Centre (GB) civilization (US) civilization (GB) Oxford recommendation civilisation (GB) Lots of folks (Ouch! The e-mail spellchecker had a lot to complain about the above quotation :-) Out of curiosity: is no "en-IE" tag needed

RE: Printing issues

2000-09-15 Thread Marco . Cimarosti
Dieter Hoffmann wrote: Are there known issues between the way AMD K6/2 handles Unicode when sent to printer by Office97? In the Windows98 SE environment whence originates this question, Wordpad98 document containing Greek and other special characters prints correctly, but when handled

RE: Ligatured characters

2000-09-15 Thread Marco . Cimarosti
Roozbeh Pournader wrote: This sequence, ZWJ ZWNJ ZWJ, really worries me. In the Arabic script, my interest, this is always the case. The ZWNJ is not enough in any case, since it disconnects the letters. And this also means some change in many simple rendering programs that use other

RE: [idn] nameprep forbidden characters

2000-09-19 Thread Marco . Cimarosti
Edwin F. Hart asked: Is there a need for a "fuzzy" comparison where names with and without points in Hebrew? Is there a similar need for other scripts such as Arabic? Mark Davis replied UCA (#10) already handles that. You will get a "fuzzy" compare if you mask off less important weights,

uax uts dutr?

2000-09-19 Thread Marco . Cimarosti
Out of curiosity, when did the acronym "UTR" ("Unicode Technical Report") mutate to those "UAX", "UTS", "DUTR" that I see in http://www.unicode.org/unicode/reports/index.html? And, BTW, how is it that a "Superseded UTR" is not, say, a "SUTR"? _ Marco

RE: [idn] nameprep forbidden characters

2000-09-20 Thread Marco . Cimarosti
Michael (michka) Kaplan wrote: It is not that simple... what if someone else registers the domain that uses the common orthographic variants? Well, I assume that it would not be possible because, by those hypothetic collation rules, the two domains would be considered the same -- like trying

[very OT] Slavic

2000-09-21 Thread Marco . Cimarosti
Peter Constable wrote: On 09/16/2000 12:56:31 PM Doug Ewell wrote: MKJ is the Ethnologue code for both 'Macedonian' and 'Slavic'. Absolutely *everyone* knows there is no one 'Slavic' language; the name refers to an entire language family. This is much more imprecise than any of the

RE: [very OT] Slavic

2000-09-21 Thread Marco . Cimarosti
Jörg Knappen wrote: No, in german "welsch" always means a romance language (in most cases french, but also italian and even romanian can fill in). Note also "rotwelsch". The "generic" term for slavonic languages is "wendisch" or "windisch" derived form the formerly slavonic "Wenden",

RE: halp me!!!!

2000-09-28 Thread Marco . Cimarosti
Karambir Rohilla wrote: wath is maping of unicode font in indian language? Sorry, your question is too clumsy. I think that no one will be able to give you an answer. You should first make some points clear to yourself, then try and ask the different differently. The things that make your

RE: New Name Registry Using Unicode

2000-10-02 Thread Marco . Cimarosti
Hi, Carl. (You replied privately; was this intentional? If not, you can resend it to the list, and I will re-send this one). A better choice, IMHO, would be to normalize by *decomposition*. In this way, the problem above would be addressed by rule 3 below. I think you have a very good

RE: New Name Registry Using Unicode

2000-10-02 Thread Marco . Cimarosti
[EMAIL PROTECTED] wrote: Just to clarify, I have no connection with the XNS project (other than as a user), but posted the info about it as of possible interest [...] I am certainly one of those who made the impression of addressing Tom himself, as if he was the author of the proposal. I

RE: Locale ID's again: simplified vs. traditional

2000-10-04 Thread Marco . Cimarosti
I wrote this blunder: *Spell checking* is one of these cases, that we are all quite familiar with. If I have to write a text using traditional hanzi in Unicode, I can tag it as "Chinese-simplified", so that my spell-checker can assist me signaling simplified characters that slipped in by

RE: Locale ID's again: simplified vs. traditional

2000-10-04 Thread Marco . Cimarosti
Jukka Korpela wrote: Does Unicode encode traditional and simplified Chinese characters separately, or is the difference considered as glyph variation only, to be indicated (if desired) at higher protocol levels? They are encoded separately, at different code points. What you heard about

UTF-8 and UTF-16 (was help me !!!)

2000-10-04 Thread Marco . Cimarosti
Karambir Rohilla wrote: Please help me anyone waht is UTF8 UTF16 ? I found these to be well written and helpful: - "Forms of Unicode" (http://www-4.ibm.com/software/developer/library/utfencodingforms/index.html ) by Mark Davis. - "Unicode Transformation Formats: UTF-8 Co."

RE: New Name Registry Using Unicode

2000-10-04 Thread Marco . Cimarosti
Carl W. Brown wrote: It would certainly seem that the optimal solution would be to carry the locale. Not at all, and for a good reasons: I need that, whenever and wherever I type in a certain string, I reach the same web site. Scenario: Imagine that I am a customer of Äöü, a (fictionary)

RE: UTF-8 and UTF-16

2000-10-06 Thread Marco . Cimarosti
George Zeigler wrote: someone send me a FAQ page that explains the difference between UTF-8 and Unicode (UTF-16 I suppose). You should perhaps read it again ;-) UTF-8 if I understand correctly only supports European characters, where as UTF-16 supports all major characters world

RE: UTF-8 and UTF-16

2000-10-06 Thread Marco . Cimarosti
I muttered this incomprehensible paragraph: - UTF-16 has 16-bit units ("words") and uses 1 or 2 units per character. Characters 00 to 00 use the corresponding word; higher values use a pair of "surrogates", the first one ("high") being in . It too exists in the same 3 variants as

RE: Giga Character Set: Anything besides noise

2000-10-12 Thread Marco . Cimarosti
John Cowan wrote (in ASCII(tm), by the way): In fact, of course, every extant Klingon text can be written with Unicode, and indeed with ISO 646:1983. Well, it can -- provided that you properly *registered* your copy of ASCII(tm) (http://www.wholehog.fsnet.co.uk/robert/ascii/), and paid your

RE: [OT] problem with shift_jis

2000-10-12 Thread Marco . Cimarosti
Raghu Kolluru wrote: My email delivery programs works with most of the charsets but not with shift_jis. Here are the steps that I do, 1) I get a text file from Japan which as the content in the encoded charset. 2) I paste this content in web based UI and store it in SQL server 3) Then I

RE: CJK combining components (was Giga Character Set: Nothing b

2000-10-16 Thread Marco . Cimarosti
Carl W. Brown: An article in the October 12, 2000 issue of Linux Weekly News http://lwn.net/bigpage.php3 tries to explain the benefit: "Many Asian characters are composites, made up of one or more simpler characters. Unicode simply makes a big catalog of characters, without recognizing

RE: Giga Character Set: Nothing but noise

2000-10-18 Thread Marco . Cimarosti
Jon Babcock wrote: It seems to me that if not for that, how could anyone make a Chinese font? Who is going to sit down and draw a *myriad* or more characters? Since elements recur, this reduces the amount of labour required greatly. I too would have bet that all CJK foundries used some form

RE: Giga Character Set: Nothing but noise

2000-10-19 Thread Marco . Cimarosti
Jon Babcock wrote: BTW, Marco, as near as I can recall, the above quotation in not from me. Did it again! Shame on me! Sorry! _ Marco

RE: CJK combining components (was RE: Giga ...)

2000-10-19 Thread Marco . Cimarosti
James E. Agenbroad wrote: If I had to make a guess it would be that transforming the glyphs of parts of characters so they will fit together in a pleasing fashion would take about as much effort (or more) than designing separate glyphs for each new character. Perhaps. I am a programmer, so

RE: Colours

2000-10-20 Thread Marco . Cimarosti
[EMAIL PROTECTED] wrote on [EMAIL PROTECTED]: Are there languages you might need to encode where colour is important? (such as, if a certain shape in red is one letter, but in blue it is a different letter) I think this is the case for the Nahuatl (Atztec) script, where color is a primary

Re: CJK combining components: MOVING TO OTHER ML

2000-10-20 Thread Marco . Cimarosti
igrams", "holograms", etc.), and how (and whether) this analysis could be useful for encoding text on computers, building software fonts, and other computer-related fall downs. Then I (Marco Cimarosti) wrote: Anyway. I think that everybody probably had quite enough of this daydreams of

RE: [Very OT] Japanese economy failing -- it's the Japanese langu

2000-10-20 Thread Marco . Cimarosti
Patrick Andries wrote, quoting from the Frankfurter Allgemeine Zeitung: [...] drei völlig getrennte Schriftsysteme gewissermaßen in bunter Mischung [...] I am not sure which "three completely separate writing system" the author had in mind. There are several possible ways of counting "Japanese

RE: Convincing executives of character code perils

2000-10-24 Thread Marco . Cimarosti
Well, my executives are mostly Italians or Dutchmen, so they are quite used to the perils of their own languages. Ouch! I have just bitten my tongue in the attempt of pronouncing a very dangerous Italian phoneme! I need medical assistance, fast! _ Ma?co -Original Message- From: J. P.

RE: Number separators

2000-10-30 Thread Marco . Cimarosti
Mike Ayers wrote: I discovered this weekend that Chinese, despite grouping large numbers by ten thousands [...], write their digits with comma separators every 3 digits [...] This may be different in different operating systems, but I too was convinced that they grouped four digits at

RE: Unicode Character not Printing

2000-11-02 Thread Marco Cimarosti
Flask Eric wrote: I have installed the Unicode versions of Arial and Times New Roman on Windows 98 running Office 97 on several PCs. Everything works fine but on two separate occasions I found out that when printing the Maltese Characters on particular printers, the Maltese Characters are

RE: Unicode Character not Printing

2000-11-02 Thread Marco Cimarosti
Flask Eric wrote: I have installed the Unicode versions of Arial and Times New Roman on Windows 98 running Office 97 on several PCs. Everything works fine but on two separate occasions I found out that when printing the Maltese Characters on particular printers, the Maltese Characters are

RE: Is there an example of web site (or page) encoded in Unicode?

2000-11-07 Thread Marco Cimarosti
Paul Deuter wrote: So can anyone point me to a web-site or page that is encoded in Unicode (UTF-16 or UCS-2)? I have seen one single example of a web page in UTF-16 (but I cant remember the URL), and never saw one in UCS-2. It is much more likely to find Unicode web pages in the form of UTF-8

Q.'s about hanja

2000-11-07 Thread Marco Cimarosti
I have some questions about the usage of hanja (Chinese characters) in Korean. 1) Is it correct to say that hanja are only used for words derived from Chinese, and never for genuninely Korean words? 2) Is it true that hanja have been abolished in North Korea? When did this happen? 3) How often

RE: Q.'s about hanja

2000-11-07 Thread Marco Cimarosti
John Cowan wrote: 3) How often are hanja used today, however? (...) I believe they are still common in newspaper headlines, because of the greater degree of compression they permit. Do you mean that some hanja have a polisyllabic pronunciation in Korean? I thought than any single hanja

RE: Q.'s about hanja

2000-11-07 Thread Marco Cimarosti
John Cowan wrote: Marco Cimarosti wrote: Do you mean that some hanja have a polisyllabic pronunciation in Korean? Yes. Of the 9033 Unihan characters with Korean readings given in the Unihan.txt file, there are 689 with two-syllable mappings, 13 with three-syllable mappings, and 2

RE: Devanagari question

2000-11-13 Thread Marco Cimarosti
Antoine Leca wrote: My understanding is that there are a number of similar cases, which are not officially prohibited (AFAIK), but does not carry any sense. For example, how about digits followed by accents (as combining marks)? Or the kana voicing/voiceless combining marks, when they

RE: Java and Unicode

2000-11-15 Thread Marco . Cimarosti
Eliotte Rusty Harold wrote: One thing I'm very curious about going forward: Right now character values greater than 65535 are purely theoretical. However this will change. It seems to me that handling these characters properly is going to require redefining the char data type from two

RE: string vs. char [was Re: Java and Unicode]

2000-11-17 Thread Marco Cimarosti
Addison P. Phillips wrote: I ended up deciding that the Unicode API for this OS will only work in strings. CTYPE replacement functions (such as isalpha) and character based replacement functions (such as strchr) will take and return strings for all of their arguments. Internally, my

RE: string vs. char [was Re: Java and Unicode]

2000-11-17 Thread Marco Cimarosti
Ooops! In my previous message, I wrote: wchar_t * _wcschr_32(const wint_t * s, wchar_t c); wchar_t * _wcsrchr_32(const wint_t * s, wchar_t c); What I actually wanted to write is: wchar_t * _wcschr_32(const wchar_t * s, wint_t c); wchar_t * _wcsrchr_32(const wchar_t * s, wint_t c); Sorry if

Re: string vs. char [was Re: Java and Unicode]

2000-11-20 Thread Marco Cimarosti
Antoine Leca wrote: Marco Cimarosti wrote: Actually, C does have different types for characters within strings and for characters in isolation. That is not my point of view. There is a special case for 'H', that holds int type rather than char, for backward compatibility reasons

[totally OT] Unicode terminology (was Re: string vs. char [was Re: Java and Unicode])

2000-11-20 Thread Marco Cimarosti
David Starner wrote: Sent: 20 Nov 2000, Mon 16.18 To: Unicode List Subject: Re: string vs. char [was Re: Java and Unicode] On Mon, Nov 20, 2000 at 06:54:27AM -0800, Michael (michka) Kaplan wrote: From: "Marco Cimarosti" [EMAIL PROTECTED] the Surrograte (aka "Astral&

Re: Open-Type Support (was: Greek Prosgegrammeni)

2000-11-22 Thread Marco Cimarosti
Lukas Pietsch wrote: a lot was said in this thread about intelligent rendering mechanisms, [...] I figure that people are mostly thinking of the technology called "Open Type", is that right? Right, but quite partial. There are several major technologies for rendering "complex Unicode

Re: Transcriptions of Unicode

2001-01-12 Thread Marco Cimarosti
list for a while. And, about points 2 and 3 above, beware that I am a second language English speaker and that I don't have much experience of American pronunciation. Ciao. Marco Cimarosti

RE: Transcriptions of Unicode

2001-01-12 Thread Marco Cimarosti
Peter Constable wrote: I'd add the square brackets, an off-glide on the "o", and aspiration (02b0) after the "k". Is that k aspirated? I do hear an aspiration when [p], [t] or [k] are at the *beginning* of "words" (mainly because teachers told me I was supposed to notice it), but I don't feel

RE: Transcriptions of Unicode

2001-01-15 Thread Marco Cimarosti
Mark Davis wrote: Much as I admire and appreciate the French language (second only to Italian), the proximate derivation of "Unicode" was not from that language, and the transcription should not match the French pronunciation. Instead, it has solid Northern Californian roots (even though not

RE: Transcriptions of Unicode

2001-01-15 Thread Marco Cimarosti
{Notice: way off-topic} Mark Davis wrote: There was a period well after the Norman invasion where a large number of words came into English directly from Latin, which was still in widespread use among scholars. Right. And it also was the language of priests, on both sides of the Channel.

RE: conjucts beginning with independent vowel?

2001-01-18 Thread Marco Cimarosti
Peter Constable wrote: In the better known Indic scripts, are there ever cases of conjuncts formed with independent vowels and a following consonant? I know this may sound weird. The idea would be a VC syllable like "al". Things that are more familiar are to have CC conjuncts, which would have an

RE: Teletext mappings

2001-01-19 Thread Marco Cimarosti
Rob Hardy wrote: I'm preparing some mappings of teletext character sets to Unicode. From http://www.sneezes.freeserve.co.uk/teletext/tech/encodings/G0_ARABIC.txt: 0x60 0x2010 # HYPHEN (or is it a dash?) I think that 0x60 should be U+0640 (ARABIC TATWEEL): a character used to extend

RE: anyone recognise this?

2001-01-22 Thread Marco Cimarosti
Peter Constable: Does anybody recognise the script in the attached sample.gif? I already tried with handwritten Devanagari (without the top bar), but an expert on another list said that it is unlikely. I thought too it could be Georgian, but then I was unable to match any single letter.

RE: Greek questions, on- and off-topic

2001-01-23 Thread Marco Cimarosti
My Greek textbook has acute, grave, and circumflex (called by those names), but I'm not sure what these correspond to in the Greek and Greek Extended blocks (there seem to be many more diacriticals than those). Is there an on-line guide somewhere? There are in fact other diacritics

RE: Chemistry on chinesse. (CJK)

2001-01-24 Thread Marco Cimarosti
Erik Garrs wrote: The elements of the periodical table (chemistry) are missing, and they are specially needed on chinesse because they don't have alphabet, so they need them as a graphical representation. Some of these characters are quite common in modern life (e.g., "oxygen" is certainly

RE: Chemistry on chinesse. (CJK)

2001-01-24 Thread Marco Cimarosti
Michael Everson wrote: There is no reason the Chinese or anyone else cannot write this with LATIN CAPITAL LETTER O and SUBSCRIPT TWO. I think there is a misunderstanding, probably on my side. In his Spanish version, Erik claimed that the chemical elements were missing "en el contexto de los

RE: Chemistry in chinesse (Only in chinesse?)

2001-01-26 Thread Marco Cimarosti
Erik Garrs wrote: Now that thanks to Pierpaolo BERNARDI who found a book (...) (dictionary) where shows what I was mentioning, MOST Chinese dictionaries that I have seen bear a table of chemical elements at the end. Perhaps you would have found out earlier going in a public library. here we

RE: Benefits of Unicode

2001-01-29 Thread Marco Cimarosti
Richard Cook wrote: Has anybody played devil's advocate to this, with a list of "Failings of Unicode"? Are there any? :-) This question might in fact result in a longer Benefits list Although I've always been a Unicode fan, Richard's invitation is too tempting. :-) I'll add these to

  1   2   3   4   5   6   7   8   >