Re: Code2000 on SourceForge (was Re: [indic] Re: Lack of Complex script rendering support on Android)

2012-02-03 Thread Antoine Leca
James Kass wrote: Of course I put these three Code2nnn fonts on SourceForge, being sick of their further development and whole commercial aura around them. Thanks for your work contributing to Unicode and to the whole community. Antoine

Re: Code2000 on SourceForge

2012-02-03 Thread Antoine Leca
Christoph Päper wrote: James Kass: License already included in SourceForge download, namely GPLv3. You probably want to use GPL+FE, i.e. GPL with font exception. http://en.wikipedia.org/wiki/GPL_font_exception I am not completely sure you want to embed Code2000 with a document you intent to

Re: ISO 10646 compliance and EU law

2004-12-27 Thread Antoine Leca
On Sunday, December 26th, 2004 5:54 a.m. (!) Philippe Verdy va escriure, entre altres: In the EU legislation, there are tons of references to languages, but much less about script systems; However, there is a well known case about them. In 1997, when it was about the building of the Euro

Re: ISO 10646 compliance and EU law

2004-12-27 Thread Antoine Leca
[ I am not subscribed to hebrew list, so I do not post there; feel free to relay if it is worth the value. I will not subscribe to this list just to post it, and since Elaine did not explain on which list she want the discussion to take place, I choose the list I am subscribed to. ] On Thursday,

Re: When to validate?

2004-12-10 Thread Antoine Leca
Arcane Jill va escriure: And yet, in an expression such as tolower(trim(s)), the second validation is unnecessary. The input to tolower() /must/ be valid, because it is the output of trim(). But on the other hand, tolower() could be called with arbitrary input, so I can't skip the validation.

Re: OpenType not for Open Communication?

2004-12-09 Thread Antoine Leca
Peter C. wrote: font vendors are creating fonts that use Unicode, platform vendors (at least Mac and Windows -- Linux is too fractured a scene to make a general statement) On Monday, December 6th, 2004 18:40Z Edward H. Trager va escriure: The really big, important applications and code

Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)

2004-12-09 Thread Antoine Leca
On Monday, December 6th, 2004 20:52Z John Cowan va escriure: Doug Ewell scripsit: Now suppose you have a UNIX filesystem, containing filenames in a legacy encoding (possibly even more than one). If one wants to switch to UTF-8 filenames, what is one supposed to do? Convert all filenames to

Re: Nicest UTF

2004-12-06 Thread Antoine Leca
Asmus Freytag wrote: A simplistic model of the 'cost' for UTF-16 over UTF-32 would consider snip 3) additional cost of accessing 16-bit registers (per character) snip For many processors, item 3 is not an issue. I do not know, I only know of a few of them; for example, I do not know how Alpha

Re: latin equivalent to specific indian characters

2004-12-05 Thread Antoine Leca
I fail to see the connection between your question and Unicode. Samedi 4 décembre 2004 13:18Z, Rene Hache écrivit: To whom it may concern, ;-) I writing because I would to know if someone can help with certain Sanskrit/Pali characters in roman scripts. Certainly there is a LOT of

Re: current version of unicode font (Open Type) in e-mails

2004-12-03 Thread Antoine Leca
Arial Unicode MS version 1.01 is most current and shipped with Office 2003. I called it OpenFont. Sorry! I double-clicked on its icon - whith a colored OT - in \WINDOWS\Fonts again it says after version 1.xx (Opent Type). I took that to mean Open Source or something more open than MS's

Re: current version of unicode-font

2004-12-03 Thread Antoine Leca
On Friday, December 03, 2004 13:10, Cristian Secar va escriure: However, the .ttf fonts that ship with their products are showing an OT icon. I don't know how it's done technically. Technically, it is done by including a (valid) 'DSIG' (digital signature) subtable into the font file, that is a

Re: Nicest UTF

2004-12-02 Thread Antoine Leca
On Wednesday, December 01, 2004 22:40Z Theodore H. Smith va escriure: Assuming you had no legacy code. And no handy libraries either, except for byte libraries in C (string.h, stdlib.h). Just a C++ compiler, a blank page to draw on, and a requirement to do a lot of Unicode text processing.

Re: Misuse of 8th bit [Was: My Querry]

2004-11-26 Thread Antoine Leca
On Thursday, November 25th, 2004 08:05Z Philippe Verdy va escriure: In ASCII, or in all other ISO 646 charsets, code positions are ALL in the range 0 to 127. Nothing is defined outside of this range, exactly like Unicode does not define or mandate anything for code points larger than

Re: Question on Canonical equivilance

2004-11-25 Thread Antoine Leca
On Wednesday, November 24th, 2004 16:26Z Tim Greenwood va escriure: All of the spacing combining marks (general category Mc) except musical symbols have a canonical combining class of 0. Why is this? About the Indic vowel signs, I assume it is this way to avoid them being reordered (in weird

Misuse of 8th bit [Was: My Querry]

2004-11-25 Thread Antoine Leca
On Wednesday, November 24th, 2004 22:16Z Asmus Freytag va escriure: I'm not seeing a lot in this thread that adds to the store of knowledge on this issue, but I see a number of statements that are easily misconstrued or misapplied, including the thoroughly discredited practice of storing

Re: Another Querry

2004-11-24 Thread Antoine Leca
On Wednesday, November 24th, 2004 04:02Z Harshal Trivedi va escriure: How can i determine end of UCS-2/UCS-4 string while encoding it in C program? It depends how you are storing and more importantly managing it. If you consider it as mere arrays of uint16_t/uint32_t, with your own functions

Re: My Querry

2004-11-23 Thread Antoine Leca
Philippe Verdy écrivit: From: Antoine Leca [EMAIL PROTECTED] For example, ASCII as designed allowed (please note I did not write was designed to allow) the use of the 8th bit as parity bit when transmitted as octet on a telecommunication line; I doubt such use is compatible with UTF-8

Re: official languages of ISO / IEC (CIE)

2004-11-09 Thread Antoine Leca
On Tuesday, November 8th, 2004 23:13Z E. Keown va escriure: Does either the ISO or the IEC have official languages? As far as I know, yes, three. BTW, about U.N. I believe there are 6 working languages. Whether official or not, is French the 'second language' of the standards world? You

Re: [indic] CLDR 1.2 Alpha now available

2004-10-01 Thread Antoine Leca
Hi Rick, On Friday, October 1st, 2004 00:17, Rick McGowan va escriure: The Unicode Consortium is pleased to announce that the alpha version of the Common Locale Data Repository (CLDR) 1.2 is available for public review. Can you please clarify what are the intent with regard to the entries

Re: internationalization assumption

2004-09-30 Thread Antoine Leca
Dear Philippe, [ I write to the list, since there is no point sending two posts. Internet is full enough of errant SMTP mails anyway. ] On Wednesday, September 29, 2004 17:42, Philippe Verdy va escriure: From: Antoine Leca Just a side point: French cannot be fully addressed with Latin 1

Re: internationalization assumption

2004-09-29 Thread Antoine Leca
On Tuesday, September 28th, 2004 03:22 Tom wrote: Let's say. The test engineer ensures the functionality and validates the input and output on major Latin 1 languages, such as German, French, Spanish, Italian, Just a side point: French cannot be fully addressed with Latin 1. Of course, it is

Re: MSDN Article, Second Draft

2004-08-23 Thread Antoine Leca
Jungshik Shin écrivit: Except in some UNIX operating systems and specialized applications with specific needs, Note that ISO C 9x specifies that wchar_t be UTF-32/UCS-4 when __STDC_ISO_10646__ is defined. This is of course very pedantic (I do not believe there are existing implementations

Re: Errors in TUS Figure 15.2?

2004-08-02 Thread Antoine Leca
On Friday, July 30th, 2004 19:47, Peter Kirk va escriure: There appear to be two errors (not listed in the errata page http://www.unicode.org/errata/) in Figure 15.2 on page 391 of The Unicode Standard 4.0, the online version at http://www.unicode.org/versions/Unicode4.0.0/ch15.pdf. snip The

Re: Errors in TUS Figure 15.2?

2004-08-02 Thread Antoine Leca
On Monday, August 2nd, 2004 12:51, Peter Kirk va escriure: On 02/08/2004 09:25, Antoine Leca wrote: And there is still a problem with the text before the figure. Which text? As I wrote before, There also seems to be an error in the text just before the figure which states In the Arabic

Re: is n with tilde used in French language ?

2004-07-05 Thread Antoine Leca
On Monday, July 05, 2004 1:52 PM Anto'nio Martins-Tuva'lkin va escriure: From Spanish cañón? I'm sure there's an excellent reason to keep the tilde but trash the acute... ;-) Yes: acute has a different meaning in French orthography (denotes a closed vowel, and can occur twice) than it has in

Re: ISO 15924 draft fixes

2004-05-21 Thread Antoine Leca
On Thursday, May 20th, 2004 23:56, Philippe Verdy wrote: I see no real problem if not all the different orthographies are listed or if they are not used universally. As long as the name is non ambiguous. What will be important for interchange of data will not be this name but the Code (or N°,

Re: ISO 15924 draft fixes

2004-05-20 Thread Antoine Leca
[Mailed _and_ posted to the list; UTF-8] On Wednesday, May 19th, 2004 10:40 PM, Michael Everson wrote: I would appreciate it if interested persons could look this over and inform me if they find any further discrepancies between the two which are worth troubling about. Then we will proceed to

Re: ISO 15924 draft fixes

2004-05-20 Thread Antoine Leca
Antoine Leca a écrit : The French name for Hang looks strange. It happened to be hangul (hangul, hangeul) (after quite a bit of discussion.) Sorry guys. For reasons known to itself, my mailer refused to post in UTF-8 this morning. I meant hangul(hangul, hangeul). According to a native ftp

Re: TR35

2004-05-18 Thread Antoine Leca
On Friday, May 14, 2004 10:22 PM, Peter Constable wrote: It is simply inadequate analysis of usage scenarios to say an order form contains formatted dates / numbers / currency that need to be interpreted, therefore this document has a locale. Sorry, you lost me. I do not know what usage

Re: ISO-15924 script nodes and UAX#24 script IDs

2004-05-18 Thread Antoine Leca
Philippe Verdy wrote on Tuesday, May 18th, 2004 12:24: Also there are differences in orthographs in the table lists: the plain text version and Table 2 use consonnants with dot below for the english name, but Table 1 use basic Latin consonnants (example for Malalayam). I believe these are

Re: [OT] English pronunciation of Quixote (was: Re: ISO-15924 script nodes...)

2004-05-18 Thread Antoine Leca
On Tuesday, May 18, 2004 5:34 PM, Doug Ewell va escriure: Staying out of this thread probably won't help it go away, so... ;-) The change of suject is adequate, anyways. This seems fair. Even if there is a Spanish adjective quixótico -- I found only one Google hit for it in Spanish, but

Re: TR35

2004-05-14 Thread Antoine Leca
On Thursday, May 13th, 2004 16:40, Peter Constable wrote: Only that I don't think it's appropriate in general to tag documents (by which I don't mean an accounting spreadsheet or an order-entry record) for things like number formatting, and so such info should not be included in attributes

Re: TR35

2004-05-14 Thread Antoine Leca
On Friday, May 14, 2004 3:30 PM, Peter Constable va escriure: To me, documents encompassed any style of writings (and was broader). For exemple, I believed that writing was invented 6 millenaries ago precisely for accounting and trading, *not* with the Hamurabi codex or the Egyptian hymns.

Re: TR35

2004-05-13 Thread Antoine Leca
On Wednesday, May 12, 2004 8:00 PM, Peter Constable va escriure: It's not particularly useful to communicate that a document was created when a locale with such-and-such number format was in effect, Sure? : Please send to us 100.000 units of your item 12010, available to our : warehouse by

Re: TR35

2004-05-12 Thread Antoine Leca
On Tuesday, May 11, 2004 6:59 PM, Philippe Verdy va escriure: From: Carl W. Brown [EMAIL PROTECTED] Expats break the locale model anyway. The problem is that we use country as both a language modifier and a location. From past comments I read here, it is understood now that locale

Re: Just if and where is the sense then?

2004-05-05 Thread Antoine Leca
On Wednesday, May 05, 2004 5:29 PM, John Jenkins va escriure: I should point out, however, that the probability of getting the pre-X versions of the Mac OS to support new 8-bit character sets is exactly 0. Would the various Indian scripts not yet covered by ILK, count as new character sets?

Re: ISO 15924

2004-05-03 Thread Antoine Leca
[ This is not copied to unicore, since I am allowed there. This is copyied to ietf-language because the question was, but it may perfectly be filtered out. ] On Sunday, May 02, 2004 10:57 PM, John Hudson va escriure: In the code lists at http://www.unicode.org/iso15924/iso15924-codes.html the

Re: lowercased Unicode language tags ? (was:ISO 15924)

2004-05-03 Thread Antoine Leca
On Monday, May 03, 2004 4:36 AM John Cowan [EMAIL PROTECTED] va escriure: Philippe Verdy scripsit: And there are also ISO 3166-2 codes for administrative regions in countries (such as FR2B for the department of Haute-Corse in France). I think those are usually written FR-2B, though I do not

Re: Variation selectors and vowel marks

2004-04-30 Thread Antoine Leca
On Thursday, April 29, 2004 2:17 PM, C J Fynn va escriure: In font lookups, where a variant glyph form of a base character is displayed due to the presence of a VS character, the lookups for glyph forms of subsequent dependant vowel marks will be dependant on the variant base glyph (as long

Re: Non-decimal positional digits; was: Defined Private Use

2004-04-28 Thread Antoine Leca
Also, before it was recognized that there are *also* used as decimal digits (using some adequate substitute for the zero), Tamil digits 1-9 were seen as part of a non-decimal-positional system. Nevertheless, they were given class Nd. By the way, if the Tengwar system is only duodecimal (as I

Re: Common Locale Data Repository Project

2004-04-23 Thread Antoine Leca
On Friday, April 23, 2004 7:02 AM Peter Constable [EMAIL PROTECTED] va escriure: due to the strong perception of OpenI18N.org as opensource/Linux advocates, even though CLDR project is not specifically bound to Linux. It is hard to look at OpenI18N.org's spec and not get the impression that

Re: [OT] Even viruses are now i18n!

2004-04-23 Thread Antoine Leca
On Friday, April 23, 2004 2:08 AM, Philippe Verdy va escriure: From: Antoine Leca On Thursday, April 22, 2004 7:14 PM Peter Kirk va escriure: The virus writers have presumably confused .tc and .tk .TR for Turkey. .TK (Tokelau) is not more sensible Or is that [tk] for Turkmen

Re: [OT] Even viruses are now i18n!

2004-04-23 Thread Antoine Leca
On Friday, April 23, 2004 3:05 PM, Marco Cimarosti va escriure: Antoine Leca wrote: The virus cannot have any knowledge of a language code. And much less of the language used by its next victim... ^ Oops: I forgot to repeat code here. Looks like it confused people

Re: [OT] Even viruses are now i18n!

2004-04-22 Thread Antoine Leca
On Thursday, April 22, 2004 7:14 PM Peter Kirk [EMAIL PROTECTED] va escriure: The virus writers have presumably confused .tc and .tk .TR for Turkey. .TK (Tokelau) is not more sensible Antoine

Re: U+0140

2004-04-20 Thread Antoine Leca
On Saturday, April 17, 2004 10:28 PM TU+1, Antnio Martins-Tuvlkin wrote: As I wrote earlier, if you know the text under inspection is Catalan, a very simple regular expression will deal with that. Any half-decent Catalan word processor do it already, by the way. What about the odd Catalan

Re: U+0140

2004-04-16 Thread Antoine Leca
On Thursday, April 15, 2004 8:16 PM, Philippe Verdy va escriure: I thought it was already answered in this list by a Catalan speaking contributor: the sequence L+middle-dot in Catalan is NOT a combining sequence. No? Then was is it? Looks like very much one, to me. The middle dot in Catalan

Re: U+0140

2004-04-16 Thread Antoine Leca
On Friday, April 16, 2004 12:31 AM, Peter Kirk va escriure: Peter Kirk a écrit : What is U+2027 intended for? The name suggests that it might be what is needed for Catalan. Hyphenation point is primarily used to visibly indicate syllabification of words. Syllable breaks are potential line

Re: U+0140

2004-04-16 Thread Antoine Leca
On Friday, April 16, 2004 3:26 PM, Ernest Cline va escriure: I don't see that as being any worse than the set of HYPHEN_MINUS, HYPHEN, MINUS SIGN, etc. Sorry, I did not make me clear. I am not intenting to say this is undoable, nor that · case is particularly complex. It is doable (as I showed

Re: U+0140

2004-04-16 Thread Antoine Leca
On Friday, April 16, 2004 12:37 PM, Philippe Verdy va escriure: In some future, we could see U+013F and U+0140 used more often than L or l plus U+00B7... I (personally) hope we would not. Notably in word processors that can detect these sequences in Catalan text and substitute them with the

Re: Fixed Width Spaces (was: Printing and Displaying DependentVowels)

2004-04-02 Thread Antoine Leca
Arcane Jill wrote: There were sixteen block-graphics characters, remember? They each were subdivided into four quadrants, each of which could be either black or white, according to the low order four bits of the codepoint. The all-white block-graphics character was visually indistinguishable

Re: French typographic thin space (was: Fixed Width Spaces)

2004-04-01 Thread Antoine Leca
On Thursday, April 01, 2004 12:37 AM Asmus Freytag [EMAIL PROTECTED] va escriure: Have you folks noticed the addition of Narrow Non Break Space? No, I did not. In fact, when I saw your message, I believe it should be a character whose code would be 0401 or somethink like that. ;-) I know it is

Re: Fixed Width Spaces (was: Printing and Displaying DependentVowels)

2004-03-31 Thread Antoine Leca
On Tuesday, March 30, 2004 11:42 PM, Ernest Cline va escriure: The main usage is with compound words such as ice cream or Louis XIV or commercial phrases such as Camry SE where for esthetic reasons an author would prefer that the space not expand upon justification, Well, as one that takes

Re: Printing and Displaying Dependent Vowels

2004-03-30 Thread Antoine Leca
On Monday, March 29, 2004 8:11 PM John Cowan va escriure: Well, it depends on what the equivoque combining marks in the title of Section 7.7 means. Ah! This is the place where I did not seek into! (It was not obvious to me that text about the dependent vowel marks has to be searched into the

Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Antoine Leca
On Sunday, March 28, 2004 12:03 AM, James Kass wrote: So, if the question is how to make an OpenType font *not* display the dotted circle on Windows with Uniscribe, one idea would be to add a spacing glyph to U+25CC (DOTTED CIRCLE) in the font. If you do so, you will end with defeating the

Re: Printing and Displaying Dependent Vowels

2004-03-29 Thread Antoine Leca
On Monday, March 29, 2004 2:14 PM, John Cowan va escriure: The bottom line is that SP+vowel and NBSP+vowel are prescribed by the Unicode Standard, I am sorry John, I should have miss a post of yours. I asked you where it is written, and did not find any answer to this; unless someone consider

Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Antoine Leca
Avarangal asked about the requirements by educational establishments is the ability to print and display dependent vowels without dotted circles. John Cowan answered: Avarangal scripsit: Can any one provide information on the sequences used for diplaying and printing dependent vowels as

Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Antoine Leca
Sorry to answer my own post. Avarangal asked about the requirements by educational establishments is the ability to print and display dependent vowels without dotted circles. John Cowan answered: Avarangal scripsit: Can any one provide information on the sequences used for diplaying and

Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Antoine Leca
Avarangal wrote: display dependent vowels without dotted circles. Can any one provide information on the sequences used for diplaying and printing dependent vowels as standalones. Microsoft's Uniscribe allows you to display a dependent vowel with the following sequence (to be followed

Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Antoine Leca
On Friday, March 26, 2004 7:12 PM, Philippe Verdy va escriure: Indic scripts are a bit unique by the fact that they have a syllabic structure decomposed into separate letters with a base consonnant and a combining (this is not the proper term for Unicode) vowel modifier after it. This differs

Re: Printing and Displaying Dependent Vowels

2004-03-26 Thread Antoine Leca
Philippe Verdy va escriure: Space is a base character, then it combines with the next diacritic with which it creates a default grapheme cluster which should be interpreted as if it was a single character identity. Agreed so far for diacritics. Agreed also for non-spacing dependent vowels

Re: Urdu Unicode website [Was: Novice question]

2004-03-25 Thread Antoine Leca
Philippe Verdy [EMAIL PROTECTED] va escriure: In my Windows XP, I have four keyboard layouts proposed for the Urdu language: Arabic (101), Arabic (102), Arabic (102) AZERTY and Urdu, plus the keyboards for the Brahmic/ISCII transliterations in India, What for a kind of keyboards is that? XP

Urdu/Penjabi/Bengali website [Was: Novice question]

2004-03-25 Thread Antoine Leca
Hi Peter, On Thursday, March 25, 2004 2:19 PM Peter Kirk [EMAIL PROTECTED] va escriure: On 25/03/2004 03:33, Antoine Leca wrote: As Peter correctly noted from day 1, all this stuff is not very important, since Urdu users really expect nastaleeq style, so either they are not using Urdu

Urdu Unicode website [Was: Novice question]

2004-03-24 Thread Antoine Leca
Peter Constable va escriure: Urdu can be written using naskh-style Arabic (supported on WinXP, Win2K...), Peter, I do not see the connection between the OS support in Windows for a given language and the traduction of a website, but while we are at this one: how do you enter Urdu with

Re: Urdu Unicode website [Was: Novice question]

2004-03-24 Thread Antoine Leca
On Wednesday, March 24, 2004 5:03 PM Peter Constable va escriure: how do you enter Urdu with Microsoft Windows 2000? I have a Spanish one with SP4, IE6 SP1, Arabic script enabled. Surely something is missing, but where can I find it? Should I use KLC? My understanding is that Spanish

Re: Novice question

2004-03-23 Thread Antoine Leca
Hi John, John Snow va escriure: I am speaking to a client regarding there website being translated in to a number of languages including Bengali, Urdu and Punjabi which I am told is not very well supported by Unicode. This is not true. These languages are supported by Unicode, since the

Re: Novice question

2004-03-23 Thread Antoine Leca
Philippe Verdy [EMAIL PROTECTED] va escriure: From: Edward H. Trager [EMAIL PROTECTED] Also, I would not bother testing Windows OSes prior to Windows 2000/XP. Why not? Since it does not even work on these, there is no point testing it on development-dead platforms either. Antoine

Re: [OT] C-sharp

2004-03-23 Thread Antoine Leca
Philippe Verdy [EMAIL PROTECTED] va escriure: The musical sharp sign, of course, is U+266F, making the correct spelling C. From TUS: These symbols are typically used for text decorations, but they may also be treated as normal text characters in applications such as typesetting chess books,

Re: Irish dotless I (was: Languages with letters that always take diacriticals

2004-03-22 Thread Antoine Leca
John Cowan va escriure: Pavel Adamek scripsit: From the viewpoint of sorting, the coding HCOMBINING C BEFORE would be much better than CCOMBINING H AFTER. For Czech, yes. For Spanish we want the latter. What for? Antoine

Re: Irish dotless I

2004-03-16 Thread Antoine Leca
On Tuesday, March 16, 2004 5:48 PM Peter Kirk [EMAIL PROTECTED] va escriure: On 16/03/2004 07:35, Carl W. Brown wrote: I suspect that just changing the font to eliminate the dot will be easier. Software won't have to be changed, existing code pages will not have to be changed, searches will

Re: Question on Unicode-prevalence (general and for Cyrillic)

2004-03-15 Thread Antoine Leca
Peter Kirk va escriure: 2. A graduate student mentioned that it was her impression that most Cyrillic webpages (at least for Russian--her interest) are still not encoded in Unicode. (She is doing some research on the use of certain words in Russian and wanted to know how best to do the

Re: What's in a wchar_t string on unix?

2004-03-05 Thread Antoine Leca
Hi Rick, On Thursday, March 04, 2004 6:56 PM, Rick Cameron va escriure: Woo-hoo! Finally, a real answer, I am sorry for you, but when one posts to some high-volume mailing list, he should expect a rather bad signal/noise ratio; this is often seen as an opportunity to get some really good

Version(s) of Unicode supported by various versions of Microsoft Windows

2004-03-05 Thread Antoine Leca
Hi folks, I discovered, to much of my surprise (but after reflexion it does hold much sense, taken in account the dates when it were developped), that Windows 2000 only support The Unicode Standard, version 2.0 URL:http://support.microsoft.com/default.aspx?scid=kb;EN-US;227483 The question, I

Re: Version(s) of Unicode supported by various versions of Microsoft Windows

2004-03-05 Thread Antoine Leca
Hi Michael, Michael (michka) Kaplan va escriure: For sortkey.nls -- that file does not ever change in size, as it is not a file that one adds characters to. Well, I do not believe this is the most adequate place to discuss this, but here is my view about it. The sorting algorithm of NT,

Re: Version(s) of Unicode supported by various versions of Microsoft Windows

2004-03-05 Thread Antoine Leca
On Friday, March 05, 2004 6:07 PM, Frank Yung-Fong Tang va escriure: Not sure how to find the information paper. But one way to check the degree of the support is to do a GetStringTypeEx agasinst some characters defined in 2.0, 2.1, 3.0, 3.1, 3.2, 4.0 to see does those return result reflect

Re: Version(s) of Unicode supported by various versions of Microsoft Windows

2004-03-05 Thread Antoine Leca
On Friday, March 05, 2004 6:39 PM, Peter Constable va escriure: People *really shouldn't* ask Does product X support Unicode version N? They should be asking questions like Can product X correctly perform function Y on such-and-such characters added in Unicode version N? Fact is, conformance

Re: What's in a wchar_t string on unix?

2004-03-04 Thread Antoine Leca
On Wednesday, March 03, 2004 11:22 PM Peter Kirk va escriure: Does it also mean wchar_t is 4 bytes if __STDC_ISO_10646__ is defined? or does it only mean wchar_t hold the character in ISO_10646 (which mean it could be 2 bytes, 4 bytes or more than that?) On 03/03/2004 11:27, Antoine Leca

Re: What's in a wchar_t string ...

2004-03-04 Thread Antoine Leca
On Thursday, March 04, 2004 2:21 PM, Arnold F Winkler va escriure: Since ISO/IEC 9899 - Programming Language C was quoted, I wonder if you are aware of the efforts of SC22/WG14 to develop a Technical Report that deals with the problems discussed in this thread. The document is ISO/IEC DTR

Re: Font Technology Standards

2004-03-03 Thread Antoine Leca
C J Fynn va escriure: [ The only thing there has been any real controversy or concern about are three Apple patents relating to grid fitting glyph outlines of TrueType fonts (see: http://www.freetype.org/patents.html ) snip Also AFAIK Apple have never threatned anyone with enforcement of

Re: What's in a wchar_t string on unix?

2004-03-03 Thread Antoine Leca
Frank Yung-Fong Tang va escriure: Does it also mean wchar_t is 4 bytes if __STDC_ISO_10646__ is defined? or does it only mean wchar_t hold the character in ISO_10646 (which mean it could be 2 bytes, 4 bytes or more than that?) The later. But if wchar_t is 16 bits, it can only encode Unicode

Re: Font Technology Standards

2004-03-03 Thread Antoine Leca
[sorry for the involontary x-post] Frank Yung-Fong Tang va escriure: For example, we can standarlized a set of Arabic glyphs with their encoding. Think about Nastaliq (rather than Naskh). There is simply no way to have it done. Too much possibilities. Idem for Latin (resp. Cyrillic, resp.

Re: What's in a wchar_t string on unix?

2004-03-02 Thread Antoine Leca
Rick Cameron asked: It seems that most flavours of unix define wchar_t to be 4 bytes. As your most suggests, this is not universal. What if it is 8-byte? ;-) If the locale is set to be Unicode, That part is highly suspect. Since you write that, you already know the wchar_t encoding (as well

Re: What's in a wchar_t string on unix?

2004-03-02 Thread Antoine Leca
Hi Frank, Sorry to be in disagreement on a couple of points. On Tuesday, March 02, 2004 5:54 PM, Frank Yung-Fong Tang wrote: Antoine Leca wrote on 3/2/2004, 5:50 AM: Rick Cameron asked: If the locale is set to be Unicode, That part is highly suspect. Since you write

Re: Filenames with non-Ascii characters

2004-02-24 Thread Antoine Leca
Kenneth Whistler wrote: Dipti Srivastava asked: If I set my LC_TYPE to en_US.UTF8 do I need to convert the non-Ascii characters like '\' in the filename for functions like open, etc. '\' *is* an ASCII character. 0x5C in ASCII to be exact. It is also 0x5C in UTF-8, so no (other) conversion

Re: Devanagari Letter Short A

2004-02-18 Thread Antoine Leca
Philippe Verdy va escriure: U+0904 DEVANAGARI LETTER SHORT A is used only for the case of an independant vowel. It can be viewed as a conjunct of the independant vowel U+0905 DEVANAGARI LETTER A and the dependant vowel sign U+0946 DEVANAGARI VOWEL SIGN SHORT E (noted for transcribing

Re: Devanagari Letter Short A

2004-02-18 Thread Antoine Leca
Ernest Cline wrote: I've been trying to make sense of the Indian scripts, but am having one small difficulty. I can't seem to find the ISCII 1991 equivalent for U+0904 (DEVANAGARI LETTER SHORT A). I do not believe you'll find it there. U+0904 had been added to Unicode for version 4.0. In

Re: Script of U+0951 .. U+0954

2002-12-05 Thread Antoine LECA
Peter Constable wrote: There is a potential concern in Uniscribe/OpenType: substitution and positioning rules in OT are organised hierarchically by script then by individual writing system / typographic groups (the label used is languages, but the intent is really groups of writing systems that

Script of U+0951 .. U+0954

2002-12-04 Thread Antoine LECA
Hi folks, I recently notice (I was off line for a while) the inclusion of the Scripts.txt file in the Unicode Character Database. I find it very interesting. I noticed it is informative. However, there is a detail that makes me quite unhappy: characters U+0951 .. U+0954 (the various accents

Re: Proposal to add Bengali Khanda Ta

2002-12-03 Thread Antoine LECA
Hi folks, This post is a bit long, so here is a resume: - regarding the encodings of TMA, they are currently several possibilities, so it should be possible to sort all normal cases with current characters. - however, this shows that ISCII provides a characetr, INV, with no counter part in

Malayalam Half-U: how

2002-11-08 Thread Antoine LECA
Hi folks, A problem was signaled in the Microsoft VOLT mailing list (this list should be dedicated to typographic, but it appears that it deals more with Indic scripts, because VOLT is the MS tool to use to encode OpenType informations in a font, which in turn is required to display Indic scripts

Re: converting ISO 8859-1 character set text to ASCII (128)charactet set

2001-06-21 Thread Antoine Leca
We have a specific requirment of converting Latin -1 character set ( iso 8859-1 ) text to ASCII charactet set ( a set of only 128 characters). Is there any special set of utilities available or service providers who can do that type of job. Look after recode (a GNU package). It performs the

Re: UTF-8S score keeping

2001-06-14 Thread Antoine Leca
I guess I should be bounced at unicoRe. I hope the interested people will monitor unicoDe. Tex Texin wrote: I am losing track of the discussion, so I decided to create my own score sheet. I welcome the initiative. However, I have a couple of minor points I feel uncomfortable with. So far

Re: Missing characters for Italian

2001-06-11 Thread Antoine Leca
[iso-8859-1] Hi, Marco Cimarosti va escriure: I am considering to file in a proposal for two new characters, to be used in Italian ordinal numbers abbreviations. Before I do this, I would like to read some opinions. Here they are... BACKGROUND snip /BACKGROUND Well, the same

Re: UTF-8 syntax

2001-06-11 Thread Antoine Leca
Jianping Yang wrote: [UTF-8S] will fix the following problem for example: For a searching engine to search the character U-0001 in UTF-8 string, and it could not find. But when UTF-8 is converted into UTF-16, it can found it there because ED A0 80 and ED B0 80 are converted into

Re: Digits shapes (was RE: RECOMMENDATIONs( Term Asian is not used properly on Computers and NET))

2001-06-11 Thread Antoine Leca
Marco Cimarosti wrote: Eliotte Rusty Harold wrote (on [EMAIL PROTECTED]): Today's European digits like 0, 1, 2, and 3 are actually closer to the original Hindu glyphs from 1000 years ago than to true Arabic numerals. About 0, that is for sure. About 2, I believe the contrary, see below.

Re: UTF8 vs AL32UTF8

2001-06-11 Thread Antoine Leca
Jianping Yang wrote: Supposedly you build you Unicode data base as UTF8. You start using the data for a web application. What happens when you send UTF-8s data to a web browser? It will work most of the time but will give you funny results from time to time. This could create a

Re: UTF-8 Syntax

2001-06-11 Thread Antoine Leca
[EMAIL PROTECTED] wrote: Carl W. Brown [EMAIL PROTECTED] wrote: In the case of strcmp the problem is that this won't even work on UCS-2. It detects the end of string with a single byte 0x00. You have to use a special Unicode compare routine this routine needs to be fixed to produce proper

Re: Missing characters for Italian

2001-06-11 Thread Antoine Leca
Marco Cimarosti écrivit (!): The second point regarding French is that, AFAIK, these abbreviations are also written with normal (non superscript) letters, as you have written them in your mail. That is true. It is as true as the fact that when we French are to write the oe digraph, we

Re: Oriyan Language

2001-06-05 Thread Antoine Leca
Hi, Noriaki Inouye wrote: Oriyan Language Ah! Something new! Hello. I'm interseted in Oriya language a little. I found a PDF file written in Oriya as follows: http://www.wbtc.com/articles/bibles/oriya/oriya_nt/Ori40Mt.pdf I can see some kinds of uniq ligatures on this file. That is

UTF-32s

2001-05-29 Thread Antoine Leca
Billancourt, le 1er avril 2001, I was thinking about this while reading the thread about UTF-8s. If the binary order of UTF-16 is of so prime interest that the (numerous) users of UTF-8 should slightly modify their code to co-operate with UTF-16-based database engines, by accepting UTF-8s rather

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-28 Thread Antoine Leca
Jianping Yang wrote: As a matter of fact, the surrogate or supplementary character was not defined in the past, How long is the past? I remember reading about these surrogates the first time I put my hands on a draft copy of ISO 10646. It was nearly six years ago. Or do you mean that it was

  1   2   >