James Kass wrote:
Of course I put these three Code2nnn fonts on SourceForge, being sick of
their further development and whole commercial aura around them.
Thanks for your work contributing to Unicode and to the whole community.
Antoine
Christoph Päper wrote:
James Kass:
License already included in SourceForge download, namely GPLv3.
You probably want to use GPL+FE, i.e. GPL with font exception.
http://en.wikipedia.org/wiki/GPL_font_exception
I am not completely sure you want to embed Code2000 with a document you
intent to
On Sunday, December 26th, 2004 5:54 a.m. (!)
Philippe Verdy va escriure, entre altres:
In the EU legislation, there are tons of references to languages,
but much less about script systems;
However, there is a well known case about them. In 1997, when it was about
the building of the Euro
[ I am not subscribed to hebrew list, so I do not post there; feel free to
relay if it is worth the value. I will not subscribe to this list just to
post it, and since Elaine did not explain on which list she want the
discussion to take place, I choose the list I am subscribed to. ]
On Thursday,
Arcane Jill va escriure:
And yet, in an expression such as tolower(trim(s)), the second
validation is unnecessary. The input to tolower() /must/ be valid,
because it is the output of trim(). But on the other hand, tolower()
could be called with arbitrary input, so I can't skip the validation.
Peter C. wrote:
font vendors are creating fonts that use Unicode, platform vendors
(at least Mac and Windows -- Linux is too fractured a scene to
make a general statement)
On Monday, December 6th, 2004 18:40Z Edward H. Trager va escriure:
The really big, important applications and code
On Monday, December 6th, 2004 20:52Z John Cowan va escriure:
Doug Ewell scripsit:
Now suppose you have a UNIX filesystem, containing filenames in a
legacy encoding (possibly even more than one). If one wants to
switch to UTF-8 filenames, what is one supposed to do? Convert all
filenames to
Asmus Freytag wrote:
A simplistic model of the 'cost' for UTF-16 over UTF-32 would consider
snip
3) additional cost of accessing 16-bit registers (per character)
snip
For many processors, item 3 is not an issue.
I do not know, I only know of a few of them; for example, I do not know how
Alpha
I fail to see the connection between your question and Unicode.
Samedi 4 décembre 2004 13:18Z, Rene Hache écrivit:
To whom it may concern,
;-)
I writing because I would to know if someone can help with certain
Sanskrit/Pali characters in roman scripts.
Certainly there is a LOT of
Arial Unicode MS version 1.01 is most current and shipped with Office
2003. I called it OpenFont. Sorry! I double-clicked on its icon -
whith a colored OT - in \WINDOWS\Fonts again it says after version
1.xx (Opent Type). I took that to mean Open Source or something
more open than MS's
On Friday, December 03, 2004 13:10, Cristian Secar va escriure:
However, the .ttf fonts that ship with their products are showing an
OT icon. I don't know how it's done technically.
Technically, it is done by including a (valid) 'DSIG' (digital signature)
subtable into the font file, that is a
On Wednesday, December 01, 2004 22:40Z Theodore H. Smith va escriure:
Assuming you had no legacy code. And no handy libraries either,
except for byte libraries in C (string.h, stdlib.h). Just a C++
compiler, a blank page to draw on, and a requirement to do a lot of
Unicode text processing.
On Thursday, November 25th, 2004 08:05Z Philippe Verdy va escriure:
In ASCII, or in all other ISO 646 charsets, code positions are ALL in
the range 0 to 127. Nothing is defined outside of this range, exactly
like Unicode does not define or mandate anything for code points
larger than
On Wednesday, November 24th, 2004 16:26Z Tim Greenwood va escriure:
All of the spacing combining marks (general category Mc) except
musical symbols have a canonical combining class of 0.
Why is this?
About the Indic vowel signs, I assume it is this way to avoid them being
reordered (in weird
On Wednesday, November 24th, 2004 22:16Z Asmus Freytag va escriure:
I'm not seeing a lot in this thread that adds to the store of
knowledge on this issue, but I see a number of statements that are
easily misconstrued or misapplied, including the thoroughly
discredited practice of storing
On Wednesday, November 24th, 2004 04:02Z Harshal Trivedi va escriure:
How can i determine end of UCS-2/UCS-4 string while encoding it in C
program?
It depends how you are storing and more importantly managing it.
If you consider it as mere arrays of uint16_t/uint32_t, with your own
functions
Philippe Verdy écrivit:
From: Antoine Leca [EMAIL PROTECTED]
For example, ASCII as designed allowed (please note I did not write
was designed to allow) the use of the 8th bit as parity bit when
transmitted as octet on a telecommunication line; I doubt such use is
compatible with UTF-8
On Tuesday, November 8th, 2004 23:13Z E. Keown va escriure:
Does either the ISO or the IEC have official
languages?
As far as I know, yes, three.
BTW, about U.N. I believe there are 6 working languages.
Whether official or not, is French the
'second language' of the standards world?
You
Hi Rick,
On Friday, October 1st, 2004 00:17, Rick McGowan va escriure:
The Unicode Consortium is pleased to announce that the alpha version
of the Common Locale Data Repository (CLDR) 1.2 is available for
public review.
Can you please clarify what are the intent with regard to the entries
Dear Philippe,
[ I write to the list, since there is no point sending two posts. Internet
is full enough of errant SMTP mails anyway. ]
On Wednesday, September 29, 2004 17:42, Philippe Verdy va escriure:
From: Antoine Leca
Just a side point: French cannot be fully addressed with Latin 1
On Tuesday, September 28th, 2004 03:22 Tom wrote:
Let's say. The test engineer ensures the functionality and validates
the input and output on major Latin 1 languages, such as German,
French, Spanish, Italian,
Just a side point: French cannot be fully addressed with Latin 1.
Of course, it is
Jungshik Shin écrivit:
Except in some UNIX operating systems and specialized applications
with specific needs,
Note that ISO C 9x specifies that wchar_t be UTF-32/UCS-4 when
__STDC_ISO_10646__ is defined.
This is of course very pedantic (I do not believe there are existing
implementations
On Friday, July 30th, 2004 19:47, Peter Kirk va escriure:
There appear to be two errors (not listed in the errata page
http://www.unicode.org/errata/) in Figure 15.2 on page 391 of The
Unicode Standard 4.0, the online version at
http://www.unicode.org/versions/Unicode4.0.0/ch15.pdf.
snip
The
On Monday, August 2nd, 2004 12:51, Peter Kirk va escriure:
On 02/08/2004 09:25, Antoine Leca wrote:
And there is still a problem with the text before the figure.
Which text?
As I wrote before,
There also seems to be an error in the text just before the figure
which states In the Arabic
On Monday, July 05, 2004 1:52 PM
Anto'nio Martins-Tuva'lkin va escriure:
From Spanish cañón? I'm sure there's an excellent reason to keep
the tilde but trash the acute... ;-)
Yes: acute has a different meaning in French orthography (denotes a closed
vowel, and can occur twice) than it has in
On Thursday, May 20th, 2004 23:56, Philippe Verdy wrote:
I see no real problem if not all the different orthographies are
listed or if they are not used universally. As long as the name is
non ambiguous. What will be important for interchange of data will
not be this name but the Code (or N°,
[Mailed _and_ posted to the list; UTF-8]
On Wednesday, May 19th, 2004 10:40 PM, Michael Everson wrote:
I would appreciate it if interested persons could look this over and
inform me if they find any further discrepancies between the two
which are worth troubling about. Then we will proceed to
Antoine Leca a écrit :
The French name for Hang looks strange. It happened to be hangul
(hangul, hangeul) (after quite a bit of discussion.)
Sorry guys. For reasons known to itself, my mailer refused to post in UTF-8
this morning. I meant hangul(hangul, hangeul).
According to a native ftp
On Friday, May 14, 2004 10:22 PM, Peter Constable wrote:
It is simply inadequate analysis of usage scenarios to say an
order form contains formatted dates / numbers / currency that need to
be interpreted, therefore this document has a locale.
Sorry, you lost me. I do not know what usage
Philippe Verdy wrote on Tuesday, May 18th, 2004 12:24:
Also there are differences in orthographs in the table lists:
the plain text version and Table 2 use consonnants with dot
below for the english name, but Table 1 use basic Latin
consonnants (example for Malalayam).
I believe these are
On Tuesday, May 18, 2004 5:34 PM, Doug Ewell va escriure:
Staying out of this thread probably won't help it go away, so...
;-)
The change of suject is adequate, anyways.
This seems fair. Even if there is a Spanish adjective quixótico --
I found only one Google hit for it in Spanish, but
On Thursday, May 13th, 2004 16:40, Peter Constable wrote:
Only that I don't think it's appropriate in general to tag
documents (by which I don't mean an accounting spreadsheet or an
order-entry record) for things like number formatting, and so such
info should not be included in attributes
On Friday, May 14, 2004 3:30 PM, Peter Constable va escriure:
To me, documents encompassed any style of writings (and was
broader). For exemple, I believed that writing was invented 6
millenaries ago precisely for accounting and trading, *not* with the
Hamurabi codex or the Egyptian hymns.
On Wednesday, May 12, 2004 8:00 PM, Peter Constable va escriure:
It's not particularly useful to communicate that a document was
created when a locale with such-and-such number format was in effect,
Sure?
: Please send to us 100.000 units of your item 12010, available to our
: warehouse by
On Tuesday, May 11, 2004 6:59 PM, Philippe Verdy va escriure:
From: Carl W. Brown [EMAIL PROTECTED]
Expats break the locale model anyway. The problem is that we use
country as both a language modifier and a location.
From past comments I read here, it is understood now that locale
On Wednesday, May 05, 2004 5:29 PM, John Jenkins va escriure:
I should point out, however, that the probability of
getting the pre-X versions of the Mac OS to support new 8-bit
character sets is exactly 0.
Would the various Indian scripts not yet covered by ILK, count as new
character sets?
[ This is not copied to unicore, since I am allowed there. This is copyied
to ietf-language because the question was, but it may perfectly be filtered
out. ]
On Sunday, May 02, 2004 10:57 PM, John Hudson va escriure:
In the code lists at
http://www.unicode.org/iso15924/iso15924-codes.html the
On Monday, May 03, 2004 4:36 AM
John Cowan [EMAIL PROTECTED] va escriure:
Philippe Verdy scripsit:
And there are also ISO 3166-2 codes for administrative regions in
countries (such as FR2B for the department of Haute-Corse in France).
I think those are usually written FR-2B, though I do not
On Thursday, April 29, 2004 2:17 PM, C J Fynn va escriure:
In font lookups, where a variant glyph form of a base character is
displayed due to the presence of a VS character, the lookups for
glyph forms of subsequent dependant vowel marks will be dependant
on the variant base glyph (as long
Also, before it was recognized that there are *also* used as decimal digits
(using some adequate substitute for the zero), Tamil digits 1-9 were seen as
part of a non-decimal-positional system. Nevertheless, they were given class
Nd.
By the way, if the Tengwar system is only duodecimal (as I
On Friday, April 23, 2004 7:02 AM
Peter Constable [EMAIL PROTECTED] va escriure:
due to the strong perception of OpenI18N.org as
opensource/Linux advocates, even though CLDR project is not
specifically bound to Linux.
It is hard to look at OpenI18N.org's spec and not get the impression
that
On Friday, April 23, 2004 2:08 AM, Philippe Verdy va escriure:
From: Antoine Leca
On Thursday, April 22, 2004 7:14 PM
Peter Kirk va escriure:
The virus writers have presumably confused
.tc and .tk
.TR for Turkey. .TK (Tokelau) is not more sensible
Or is that [tk] for Turkmen
On Friday, April 23, 2004 3:05 PM, Marco Cimarosti va escriure:
Antoine Leca wrote:
The virus cannot have any knowledge of a language code. And
much less of the language used by its next victim...
^
Oops: I forgot to repeat code here. Looks like it confused people
On Thursday, April 22, 2004 7:14 PM
Peter Kirk [EMAIL PROTECTED] va escriure:
The virus writers have presumably confused
.tc and .tk
.TR for Turkey. .TK (Tokelau) is not more sensible
Antoine
On Saturday, April 17, 2004 10:28 PM TU+1, Antnio Martins-Tuvlkin wrote:
As I wrote earlier, if you know the text under inspection is
Catalan, a very simple regular expression will deal with that. Any
half-decent Catalan word processor do it already, by the way.
What about the odd Catalan
On Thursday, April 15, 2004 8:16 PM, Philippe Verdy va escriure:
I thought it was already answered in this list by a Catalan speaking
contributor: the sequence L+middle-dot in Catalan is NOT a combining
sequence.
No? Then was is it? Looks like very much one, to me.
The middle dot in Catalan
On Friday, April 16, 2004 12:31 AM, Peter Kirk va escriure:
Peter Kirk a écrit :
What is U+2027 intended for? The name suggests that it might be what
is needed for Catalan.
Hyphenation point is primarily used to visibly indicate
syllabification of words. Syllable breaks are potential line
On Friday, April 16, 2004 3:26 PM, Ernest Cline va escriure:
I don't see that as being any worse than the set of HYPHEN_MINUS,
HYPHEN, MINUS SIGN, etc.
Sorry, I did not make me clear. I am not intenting to say this is undoable,
nor that · case is particularly complex. It is doable (as I showed
On Friday, April 16, 2004 12:37 PM, Philippe Verdy va escriure:
In some future, we could see U+013F and U+0140 used more often than L
or l plus U+00B7...
I (personally) hope we would not.
Notably in word processors that can detect these
sequences in Catalan text and substitute them with the
Arcane Jill wrote:
There were sixteen block-graphics characters, remember?
They each were subdivided into four quadrants, each of
which could be either black or white, according to the
low order four bits of the codepoint. The all-white
block-graphics character was visually indistinguishable
On Thursday, April 01, 2004 12:37 AM
Asmus Freytag [EMAIL PROTECTED] va escriure:
Have you folks noticed the addition of Narrow Non Break Space?
No, I did not. In fact, when I saw your message, I believe it should be a
character whose code would be 0401 or somethink like that. ;-) I know it is
On Tuesday, March 30, 2004 11:42 PM, Ernest Cline va escriure:
The main usage is with compound words such as ice cream or
Louis XIV or commercial phrases such as Camry SE where for
esthetic reasons an author would prefer that the space not expand
upon justification,
Well, as one that takes
On Monday, March 29, 2004 8:11 PM
John Cowan va escriure:
Well, it depends on what the equivoque combining marks in the title
of Section 7.7 means.
Ah! This is the place where I did not seek into! (It was not obvious to me
that text about the dependent vowel marks has to be searched into the
On Sunday, March 28, 2004 12:03 AM, James Kass wrote:
So, if the question is how to make an OpenType font *not* display the
dotted circle on Windows with Uniscribe, one idea would be to add a
spacing glyph to U+25CC (DOTTED CIRCLE) in the font.
If you do so, you will end with defeating the
On Monday, March 29, 2004 2:14 PM, John Cowan va escriure:
The bottom line is that SP+vowel and NBSP+vowel are prescribed by the
Unicode Standard,
I am sorry John, I should have miss a post of yours. I asked you where it is
written, and did not find any answer to this; unless someone consider
Avarangal asked about
the requirements by educational establishments is the ability
to print and display dependent vowels without dotted circles.
John Cowan answered:
Avarangal scripsit:
Can any one provide information on the sequences used for diplaying
and printing dependent vowels as
Sorry to answer my own post.
Avarangal asked about
the requirements by educational establishments is the ability
to print and display dependent vowels without dotted circles.
John Cowan answered:
Avarangal scripsit:
Can any one provide information on the sequences used for diplaying
and
Avarangal wrote:
display dependent vowels without dotted circles.
Can any one provide information on the sequences used for
diplaying and printing dependent vowels as standalones.
Microsoft's Uniscribe allows you to display a dependent vowel with the
following sequence (to be followed
On Friday, March 26, 2004 7:12 PM, Philippe Verdy va escriure:
Indic scripts are a bit unique by the fact that they have a syllabic
structure decomposed into separate letters with a base consonnant and
a combining (this is not the proper term for Unicode) vowel
modifier after it. This differs
Philippe Verdy va escriure:
Space is a base character, then it combines with the next diacritic
with which it creates a default grapheme cluster which should be
interpreted as if it was a single character identity.
Agreed so far for diacritics. Agreed also for non-spacing dependent vowels
Philippe Verdy [EMAIL PROTECTED] va escriure:
In my Windows XP, I have four keyboard layouts proposed for the Urdu
language: Arabic (101), Arabic (102), Arabic (102) AZERTY and
Urdu, plus the keyboards for the Brahmic/ISCII transliterations in
India,
What for a kind of keyboards is that?
XP
Hi Peter,
On Thursday, March 25, 2004 2:19 PM
Peter Kirk [EMAIL PROTECTED] va escriure:
On 25/03/2004 03:33, Antoine Leca wrote:
As Peter correctly noted from day 1, all this stuff is not very
important, since Urdu users really expect nastaleeq style, so either
they are not using Urdu
Peter Constable va escriure:
Urdu can be written using naskh-style Arabic (supported on WinXP,
Win2K...),
Peter,
I do not see the connection between the OS support in Windows for a given
language and the traduction of a website, but while we are at this one: how
do you enter Urdu with
On Wednesday, March 24, 2004 5:03 PM
Peter Constable va escriure:
how
do you enter Urdu with Microsoft Windows 2000? I have a Spanish one
with SP4, IE6 SP1, Arabic script enabled. Surely something is
missing, but where can I find it? Should I use KLC?
My understanding is that Spanish
Hi John,
John Snow va escriure:
I am speaking to a client regarding there website being translated in
to a number of languages including Bengali, Urdu and Punjabi which I
am told is not very well supported by Unicode.
This is not true. These languages are supported by Unicode, since the
Philippe Verdy [EMAIL PROTECTED] va escriure:
From: Edward H. Trager [EMAIL PROTECTED]
Also, I would not bother testing Windows OSes prior to Windows
2000/XP.
Why not?
Since it does not even work on these, there is no point testing it on
development-dead platforms either.
Antoine
Philippe Verdy [EMAIL PROTECTED] va escriure:
The musical sharp sign, of course, is U+266F, making the correct
spelling C.
From TUS: These symbols are typically used for text decorations, but they
may also be treated as normal text characters in applications such as
typesetting chess books,
John Cowan va escriure:
Pavel Adamek scripsit:
From the viewpoint of sorting,
the coding HCOMBINING C BEFORE
would be much better than
CCOMBINING H AFTER.
For Czech, yes. For Spanish we want the latter.
What for?
Antoine
On Tuesday, March 16, 2004 5:48 PM
Peter Kirk [EMAIL PROTECTED] va escriure:
On 16/03/2004 07:35, Carl W. Brown wrote:
I suspect that just changing the font to eliminate the dot will be
easier. Software won't have to be changed, existing code pages will
not have to be changed, searches will
Peter Kirk va escriure:
2. A graduate student mentioned that it was her impression that most
Cyrillic webpages (at least for Russian--her interest) are still not
encoded in Unicode. (She is doing some research on the use of
certain words in Russian and wanted to know how best to do the
Hi Rick,
On Thursday, March 04, 2004 6:56 PM, Rick Cameron va escriure:
Woo-hoo! Finally, a real answer,
I am sorry for you, but when one posts to some high-volume mailing list, he
should expect a rather bad signal/noise ratio; this is often seen as an
opportunity to get some really good
Hi folks,
I discovered, to much of my surprise (but after reflexion it does hold much
sense, taken in account the dates when it were developped), that Windows
2000 only support The Unicode Standard, version 2.0
URL:http://support.microsoft.com/default.aspx?scid=kb;EN-US;227483
The question, I
Hi Michael,
Michael (michka) Kaplan va escriure:
For sortkey.nls -- that file does not ever change in size, as it is
not a file that one adds characters to.
Well, I do not believe this is the most adequate place to discuss this, but
here is my view about it.
The sorting algorithm of NT,
On Friday, March 05, 2004 6:07 PM, Frank Yung-Fong Tang va escriure:
Not sure how to find the information paper. But one way to check the
degree of the support is to do a GetStringTypeEx agasinst some
characters defined in 2.0, 2.1, 3.0, 3.1, 3.2, 4.0 to see does those
return result reflect
On Friday, March 05, 2004 6:39 PM, Peter Constable va escriure:
People *really shouldn't* ask Does product X support Unicode version
N? They should be asking questions like Can product X correctly
perform function Y on such-and-such characters added in Unicode
version N?
Fact is, conformance
On Wednesday, March 03, 2004 11:22 PM Peter Kirk va escriure:
Does it also mean wchar_t is 4 bytes if __STDC_ISO_10646__ is
defined? or does it only mean wchar_t hold the character in
ISO_10646 (which mean it could be 2 bytes, 4 bytes or more than
that?)
On 03/03/2004 11:27, Antoine Leca
On Thursday, March 04, 2004 2:21 PM, Arnold F Winkler va escriure:
Since ISO/IEC 9899 - Programming Language C was quoted, I wonder if
you are aware of the efforts of SC22/WG14 to develop a Technical
Report that deals with the problems discussed in this thread.
The document is ISO/IEC DTR
C J Fynn va escriure:
[ The only thing there has been any real controversy or concern about
are three Apple patents relating to grid fitting glyph outlines of
TrueType fonts (see: http://www.freetype.org/patents.html )
snip
Also AFAIK Apple have never threatned anyone with
enforcement of
Frank Yung-Fong Tang va escriure:
Does it also mean wchar_t is 4 bytes if __STDC_ISO_10646__ is defined?
or does it only mean wchar_t hold the character in ISO_10646
(which mean it could be 2 bytes, 4 bytes or more than that?)
The later. But if wchar_t is 16 bits, it can only encode Unicode
[sorry for the involontary x-post]
Frank Yung-Fong Tang va escriure:
For example, we can standarlized a set of Arabic glyphs with their
encoding.
Think about Nastaliq (rather than Naskh). There is simply no way to have it
done. Too much possibilities.
Idem for Latin (resp. Cyrillic, resp.
Rick Cameron asked:
It seems that most flavours of unix define wchar_t to be 4 bytes.
As your most suggests, this is not universal. What if it is 8-byte? ;-)
If the locale is set to be Unicode,
That part is highly suspect.
Since you write that, you already know the wchar_t encoding (as well
Hi Frank,
Sorry to be in disagreement on a couple of points.
On Tuesday, March 02, 2004 5:54 PM, Frank Yung-Fong Tang wrote:
Antoine Leca wrote on 3/2/2004, 5:50 AM:
Rick Cameron asked:
If the locale is set to be Unicode,
That part is highly suspect.
Since you write
Kenneth Whistler wrote:
Dipti Srivastava asked:
If I set my LC_TYPE to en_US.UTF8 do I need to convert the non-Ascii
characters like '\' in the filename for functions like open, etc.
'\' *is* an ASCII character. 0x5C in ASCII to be exact. It is
also 0x5C in UTF-8, so no (other) conversion
Philippe Verdy va escriure:
U+0904 DEVANAGARI LETTER SHORT A is used only for the case of an
independant vowel. It can be viewed as a conjunct of the
independant vowel U+0905 DEVANAGARI LETTER A and the dependant
vowel sign U+0946 DEVANAGARI VOWEL SIGN SHORT E (noted for
transcribing
Ernest Cline wrote:
I've been trying to make sense of the Indian scripts, but am
having one small difficulty. I can't seem to find the ISCII 1991
equivalent for U+0904 (DEVANAGARI LETTER SHORT A).
I do not believe you'll find it there.
U+0904 had been added to Unicode for version 4.0. In
Peter Constable wrote:
There is a potential concern in Uniscribe/OpenType: substitution and
positioning rules in OT are organised hierarchically by script then by
individual writing system / typographic groups (the label used is
languages, but the intent is really groups of writing systems that
Hi folks,
I recently notice (I was off line for a while) the inclusion of the
Scripts.txt file in the Unicode Character Database. I find it very
interesting.
I noticed it is informative. However, there is a detail that makes
me quite unhappy: characters U+0951 .. U+0954 (the various accents
Hi folks,
This post is a bit long, so here is a resume:
- regarding the encodings of TMA, they are currently several possibilities,
so it should be possible to sort all normal cases with current characters.
- however, this shows that ISCII provides a characetr, INV, with no
counter part in
Hi folks,
A problem was signaled in the Microsoft VOLT mailing list (this list
should be dedicated to typographic, but it appears that it deals
more with Indic scripts, because VOLT is the MS tool to use to encode
OpenType informations in a font, which in turn is required to display
Indic scripts
We have a specific requirment of converting Latin -1 character set ( iso
8859-1 ) text to ASCII charactet set ( a set of only 128 characters). Is
there any special set of utilities available or service providers who can do
that type of job.
Look after recode (a GNU package). It performs the
I guess I should be bounced at unicoRe. I hope the interested people
will monitor unicoDe.
Tex Texin wrote:
I am losing track of the discussion, so I decided to create my
own score sheet.
I welcome the initiative. However, I have a couple of minor points
I feel uncomfortable with.
So far
[iso-8859-1]
Hi,
Marco Cimarosti va escriure:
I am considering to file in a proposal for two new characters, to be used in
Italian ordinal numbers abbreviations.
Before I do this, I would like to read some opinions.
Here they are...
BACKGROUND
snip
/BACKGROUND
Well, the same
Jianping Yang wrote:
[UTF-8S] will fix the following problem for example:
For a searching engine to search the character U-0001 in UTF-8 string, and it
could not find. But when UTF-8 is converted into UTF-16, it can found it there
because ED A0 80 and ED B0 80 are converted into
Marco Cimarosti wrote:
Eliotte Rusty Harold wrote (on [EMAIL PROTECTED]):
Today's European digits like 0, 1, 2, and 3 are actually closer to
the original Hindu glyphs from 1000 years ago than to true Arabic
numerals.
About 0, that is for sure. About 2, I believe the contrary, see below.
Jianping Yang wrote:
Supposedly you build you Unicode data base as UTF8. You start using the
data for a web application. What happens when you send UTF-8s data to a web
browser? It will work most of the time but will give you funny results from
time to time. This could create a
[EMAIL PROTECTED] wrote:
Carl W. Brown [EMAIL PROTECTED] wrote:
In the case of strcmp the problem is that this won't even work on UCS-2.
It detects the end of string with a single byte 0x00. You have to use a
special Unicode compare routine this routine needs to be fixed to produce
proper
Marco Cimarosti écrivit (!):
The second point regarding French is that, AFAIK, these abbreviations are
also written with normal (non superscript) letters, as you have written them
in your mail.
That is true. It is as true as the fact that when we French are to write
the oe digraph, we
Hi,
Noriaki Inouye wrote:
Oriyan Language
Ah! Something new!
Hello. I'm interseted in Oriya language a little.
I found a PDF file written in Oriya as follows:
http://www.wbtc.com/articles/bibles/oriya/oriya_nt/Ori40Mt.pdf
I can see some kinds of uniq ligatures on this file.
That is
Billancourt, le 1er avril 2001,
I was thinking about this while reading the thread about UTF-8s.
If the binary order of UTF-16 is of so prime interest that the
(numerous) users of UTF-8 should slightly modify their code
to co-operate with UTF-16-based database engines, by
accepting UTF-8s rather
Jianping Yang wrote:
As a matter of fact, the surrogate or supplementary character was not defined
in the past,
How long is the past? I remember reading about these surrogates the first
time I put my hands on a draft copy of ISO 10646. It was nearly six years ago.
Or do you mean that it was
1 - 100 of 187 matches
Mail list logo