My opinion is that it can be viewed, depending on its application, as a letter (for
some transliteration purpose), or as a diacritic (for some other transliterations).
But in reality it is mostly a letter modifier. For UCA, it sorts mostly like the base
letter that it modifies, and UCA gives
From: Azzedine Ait Khelifa [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, June 05, 2003 4:55 PM
Subject: Tamazight/berber language : How to send mail, write word documents
Hello all,
I need help about tamazight(berber) language.
All letters of tamazght alphabet are into
Wow, thanks for this cool tool! Now I can edit international text for most European
languages directly from my extended custom French keyboard (including the missing OE
and AE ligatures that I wanted since long on the French keyboard)...
(I changed the degree sign to a ring diacritic dead key,
From: Mark Davis [EMAIL PROTECTED]
This is not an oversight. As I said, many characters are not
Alphabetic and are still part of words. Examples include that
character and many others. As a simple case, can't is a word in
English, although the apostrophe is not alphabetic. There are many,
From: [EMAIL PROTECTED]
Don Osborn wrote on 06/05/2003 07:34:29 PM:
There are probably some existing standard for keyboard mappings,
promoted by
UNESCO and published in a ISO standard.
If there were such a thing (for Tamazight or any other African
language) I'd be
very interested
on, or users
are already trained to useand switch from theFrench keyboard and the
Arabic keyboard. Adding Tifinagh to the French keyboard is then quite simple and
natural.
-- Philippe.- Original Message - From: "Marco
Cimarosti" [EMAIL PROTECTED]To: "'Philippe Verdy'"
[
From: [EMAIL PROTECTED]
I would NOT recommend using a math symbol for this. Especially
considering
the above. The CAPITAL O WITH STROKE (Ø) is probably better.
It is not better. If anything might be better, it would be a digit zero
from a font that has a slash through it. In the past,
From: Mark Davis [EMAIL PROTECTED]
From: Anto'nio Martins-Tuva'lkin [EMAIL PROTECTED]
On 2003.05.25, 00:00, Philippe Verdy [EMAIL PROTECTED] wrote:
even if the Dutch language considers it as a single letter, in a
way similar to the Spanish ch
I see one major difference: When you apply
From: Anto'nio Martins-Tuva'lkin [EMAIL PROTECTED]
On 2003.05.25, 00:00, Philippe Verdy [EMAIL PROTECTED] wrote:
even if the Dutch language considers it as a single letter, in a
way similar to the Spanish ch
I see one major difference: When you apply extra wide inter-char
distance, you
From: [EMAIL PROTECTED]
Philippe Verdy wrote on 05/27/2003 11:50:39 AM:
Don't speak about overwriting sequences using Backspace in Unicode!
I wasn't; I was talking about typewriters, though the comparable thing was
done in the era of Wordstar and daisy wheel / dot matrix printers.
I
From: Markus Scherer [EMAIL PROTECTED]
Paul Hastings wrote:
would it be correct to say that javascript natively supports unicode?
ECMAScript, of which JavaScript and JScript are implementations, is defined on
16-bit Unicode
scripts and using 16-bit Unicode strings.
In other words, the
A logo with a yellow or light blue or pale green background would be more appealing on
various bright backgrounds. I also think that the grey logo is too dark and difficult
to red, and the pink logo is quite strange.
The red of the checkmark should contrast more by using asaturated color, and
From: Pim Blokland [EMAIL PROTECTED]
To: Unicode mailing list [EMAIL PROTECTED]
Sent: Wednesday, May 28, 2003 11:45 AM
Subject: Re: Dutch IJ, again
Philippe Verdy schreef:
i+j is a single combined Dutch ij character only if its not
followed by a vowel
This is not true; where did you get
From: Marco Cimarosti [EMAIL PROTECTED]
Yes, you are right. I never heard the word savvy before this morning.
Savvy is better understood in this context as aware, than archaic or informal in
your English-Italian dictionnary. It means the author of the website that uses this
logo has considered
From: Theodore H. Smith [EMAIL PROTECTED]
Why not put up a call for Unicode logos? Instead of asking for an
inhouse one to be made, I'm sure you'd get more logos offered than you
could know what to do with. At the worst, you could have a design to
learn from.
Some of my logos were made
I don't know if an attachment here will work, but these are two other alternate logos
which look more appealing with a tiny 3D button effect, the Unicode red and white
UNi logo (and visible trademark symbol), and the word Savvy in Blue (and a green
check mark), or a variant using the term
From: Karl Pentzlin [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, May 28, 2003 9:59 PM
Subject: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?
When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?
Is there a difference of appearance in high quality
From: [EMAIL PROTECTED]
On 28/05/2003 13:56:47 Philippe Verdy wrote:
My question is more related to the requirements to display such a logo.
After
all, one could use this logo on a web site that uses a standardized
encoding
like ISO-8859-1
Why would you think that when the logo page
Most probably, Sun upgraded its tables from ICU, and ICU had this bug, which did not
exist in their prior tables for MS-CP932. So the source of the data may now be
different, or there may be an alias problem in the MS-CP932 encoding name.
Submit this bug to Sun, (and probably also to IBM's ICU),
From: Tom Gewecke [EMAIL PROTECTED]
I wonder about this. The Unicode FAQ makes the point that some browsers
will not display NCR's unless the charset is UTF-8. It does seem logical
that, NCR's or not, a page with the logo should be in one of the three
standard Unicode Encoding Forms, UTF-8,
From: Kazuhiro Kazama [EMAIL PROTECTED]
From: Jane Liu [EMAIL PROTECTED]
Subject: Shift-JIS/Unicode mapping in JAVA
Date: Wed, 28 May 2003 12:36:39 -0700 (PDT)
Message-ID: [EMAIL PROTECTED]
I am running a JAVA program on Japanese Windows 2000 system, looking
at the Unicode conversion of
From: Michael Everson [EMAIL PROTECTED]
Both logos are around 800 bytes, and 16 colors (using the web
palette), with a bit antialiasing.
Garish galore, I would say. Purplize the red somewhat?
Per your request, these button logos use a darker red (same dimensions as before).
(The source
From: Marco Cimarosti [EMAIL PROTECTED]
As this comes from an Unicode official, I guess we should simply accept
it... Nevertheless, I wonder whether displaying the Unicode *logo* per se
has the same legal implication as displaying a *banner* which contains the
Unicode logo.
I note that the
From: Roozbeh Pournader [EMAIL PROTECTED]
On Mon, 28 Apr 2003, Mark Davis wrote:
BTW, the ICU demos have been all upgraded to Unicode 4.0, on
http://oss.software.ibm.com/icu/demo/.
They include:
[...]
IDNA Demo
This simple demo performs IDNA transformations as described in RFC
From: Carl W. Brown [EMAIL PROTECTED]
It looks to me like UNCODE. Has the UN has taken a rode in globalization? Maybe
the web page has no scripting but is still savvy.
Wrong! You strip the very visible dot from the i letter, you also refse to see that
there's a ligature between the U and N.
From: Ben Dougall [EMAIL PROTECTED]
On Wednesday, May 28, 2003, at 06:59 pm, Otto Stolz wrote:
PS. In these tow languages, the quote-marks are paired thusly:
en_US: U+201C ... U+201D, and U+2018 ... U+2019
de_DE: U+201E ... U+201C, and U+201A ... U+2018
are they the right way
From: [EMAIL PROTECTED]
there are still (even more) browsers that do not display UTF-8
correctly...
who still use very often a browser that supports some form their
national encoding (SJIS, GB2312, Big5, KSC5601), sometimes with
ISO2022-* but shamely do not decode UTF-8 properly (even
Don't use Windows-31J, it is a encoding name alias that is not used by Microsoft for
its 932 codepage! So it would cause problems with other compliant JVMs.
Better use CP932 which seems to be the canonical name used by Sun in its reference
implementation, or windows-932 documented in the
From: Theodore H. Smith [EMAIL PROTECTED]
I'm not sure what other people experience, but I see a note saying the
attachment was (quite correctly I think) removed from the email, and
instead just lists the name and format of the attachment.
I'm on the digest format.
You may see the GIF
Edward H Trager wrote:
John Hudson wrote:
John Cowan wrote:
Netscape 4.x is dead.
I wish it were. Monitoring the web traffic at one of the sites I'm involved
with, I am dismayed to see that more than 5% of visitors are using Netscape
4.7.
Lots of organizations may have reasons like
From: Ben Dougall [EMAIL PROTECTED]
On Thursday, May 29, 2003, at 02:10 pm, Philippe Verdy wrote:
Interestingly, the French first-level quotation marks use what we call
chevrons (double angle brackets).
However there are some typographical considerations that common fonts
forget
From: Patrick Andries [EMAIL PROTECTED]
From: Philippe Verdy ([EMAIL PROTECTED])
Microsoft displays these French translations for character names. There are
however some strange translations that lack a common formal format that
allows easier searching for related characters.
I would
- Original Message -
From: William Overington [EMAIL PROTECTED]
To: Magda Danish (Unicode) [EMAIL PROTECTED]; [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Friday, May 30, 2003 10:20 AM
Subject: Re: Announcement: New Unicode Savvy Logo
Now that Mark Davis has made a statement in the
From: [EMAIL PROTECTED]
Patrick Andries on 05/29/2003 06:15:10 PM:
Could letters like « l molle »
(http://pages.infinit.net/hapax/abcmeigret.jpg
) or long-tailed A (between O and P in Baïf's alphabet http://pages.
infinit.net/hapax/abcbaif.jpg), letters which I believe cannot be
From: John Cowan [EMAIL PROTECTED]
Ben Dougall scripsit:
why is it not categorised as white space then? or is it? doesn't look
like it is to me, but i'm not sure how to actually find out for sure.
Well, um, it's not white: there is a dot in it.
Not really, in many applications it will
From: Carl W. Brown [EMAIL PROTECTED]
Private Use Areas are by definition not interoperable and clearly
not designed to be used on the web.
Their use in a page to display text clearly does not qualify, as
it requires proprietary fonts to display them.
People use special fonts all the
From: Jim Allan [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, May 30, 2003 8:05 PM
Subject: Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?
John Cowan posted:
Not really, in many applications it will translate in one or more dots
just to create a dotted line
From: Kenneth Whistler [EMAIL PROTECTED]
That last fact should be taken as a hint that for most
purposes, manual leaders should just be sequences of FULL STOP
characters (as you will see, for instance in the plain text
representations of Internet Drafts or RFCs, for example).
But in any rich
From: Kenneth Whistler [EMAIL PROTECTED]
Philippe Verdy continued:
What surprizes me the most in the Unicode spec is that it
both says that its purpose is to create arbitrary length
of leaders
As in plain text, as can be seen in Table of Content listings
in many RFCs, for example
Zip files should have no problems to contain files with UTF-8 names.
In fact the encoding allows it, and the only reason why you can't do it is the
limitation of the ZIP tool you use which blindly uses only the encoding of the
filesystem from which the file is created.
Use the jar zip tool
From: Marion Gunn [EMAIL PROTECTED]
Ar 17:51 +0200 2003/05/29, Philippe Verdy entre sur son clavier:
I would prefer to say that Netscape 4.0 is dead, but Netscape 4.7x is not (I
D'accord. (With the above I'd have to agree.)
see no reason why users should continue to use versions before 4.7
From: Raymond Mercier [EMAIL PROTECTED]
Well, you would expect that, since Win9* and WinNT/2000/XP differ
fundamentally regarding unicode compliance.
Proably true for the filesystem level, but certainly not for the file index stored in
a ZIP file where there's no reason why it should not
From: Marion Gunn [EMAIL PROTECTED]
What, then, is the code for the English of 'Northern Ireland'?
(GB+NI=UK.)
Since Ulster, as IANA [EMAIL PROTECTED] knows, is divided by an
international border, is the logical reply 'encode Ulster English
separately for each side of the border'? Is Basque
From: Raymond Mercier [EMAIL PROTECTED]
At 00:11 01/06/2003 +0200, you wrote:
but certainly not for the file index stored in a ZIP file where there's no
reason why it should not contain correctly encoded and portable UTF-8 names
Doesn't one have to know the binary format of a Zip file to be
Noe the following ambiguity in the ZIP file format specification:
[QUOTE]
file name: (Variable)
The name of the file, with optional relative path. The path stored should not contain
a drive or device letter, or a leading slash. All slashes should be forward slashes
'/' as opposed to backwards
-specific and context dependant, as it obeys to a convention not to a strict
definition.
-- Philippe.
- Original Message -
From: Kent Karlsson [EMAIL PROTECTED]
To: 'Philippe Verdy' [EMAIL PROTECTED]
Sent: Tuesday, June 03, 2003 4:16 PM
Subject: RE: Rare extinct latin letters
(offline
From: Marion Gunn [EMAIL PROTECTED]
I ask the patience of the Unicode and IETF-L moderators for now posting
on their lists this request for contact details for the ISO 3166 mailing
lists (if any).
Context: Ireland advisability of reserving 'EI' tag for cited usage
(baggage-handling at
From: Kent Karlsson [EMAIL PROTECTED]
Sorry, may be I was chosing the wrong diacritic (I was
confused by its name, and I should have verified in the charts).
Isn't U+0316 COMBINING HORN (combining class 216) what I
wanted to use?
Let me cut my reply short: no.
...
script which
From: John Hudson [EMAIL PROTECTED]
At 06:39 AM 6/3/2003, [EMAIL PROTECTED] wrote:
Philippe Verdy [EMAIL PROTECTED] wrote on 06/03/2003 07:25:46 AM:
How do you consider the existing hook diacritic ?
If you're talking about U+0309 COMBINING HOOK ABOVE, I don't think it
normally
In all major databases, the native encodings of the OS, of the database when it was
created, of the networking protocol, of the SQL queries and results, and of the client
application are all independant.
When JDBC connects to a database, it gets a lot of environment information from the
server
) Kaplan [EMAIL PROTECTED]
To: Philippe Verdy [EMAIL PROTECTED]
Sent: Wednesday, June 04, 2003 4:36 PM
Subject: Re: Encoding converion through JDBC
From: Philippe Verdy [EMAIL PROTECTED]
Phillipe, you went on for quite a while and I admit most of the things you
talked about are not thing about
From: [EMAIL PROTECTED]
Jim Allen wrote on 05/30/2003 09:38:12 AM:
See also
http://www.usefulcontent.org/docs/manuals/REC-MathML2-20010221/isoamso.html
for some mathml characters and their unicode encodings.
The character empty is encoded as U+2205 plus the variation selector
For some references, look at this page (which displays a table of symbols):
http://www.archivesnationales.culture.gouv.fr/camt/fr/se/fiche4/fiche4-1.html
It describes the Prévost-Delaunay Method (from the official web site of National
French Archives)
An example of text is on:
From: Lukas Pietsch [EMAIL PROTECTED]
I was hoping to find someone who had additional evidence for this
character.
I happened to come across it the other day in a modern printed edition
of 17th- to 19th century handwritten English letters (Miller, Kerby
A., Arnold Schrier, Bruce D. Boling,
From: Tom Gewecke [EMAIL PROTECTED]
There are interesting signs and symbols in this script, which could still
have their use today for other applications than live transcriptions. I
have a couple of books (published in 1972 and 1974) which describes the
system (in business environements), and
From: Tom Gewecke [EMAIL PROTECTED]
http://www.unicode.org/roadmaps/smp/
Thanks for pointing this block. But will it be enough to support at least
the most wellknown variants that were (and sometimes are still) tought ?
It seems doubtful, given the huge number of systems out there. Many
From: Pim Blokland [EMAIL PROTECTED]
Antnio Martins-Tuvlkin schreef:
[quoting Radovan Garabik]
In fact, the apostrophe form is used because there is a lack of
convenient space to put carons over tall letters d,t,l, whereas
there is no problem with n,e,r.
Funny you should bring this
Pim Blokland [EMAIL PROTECTED] wrote:
No. Encoded like that it may *look* like a roman three, but two of
those are definitely not correct. Only U+2162 or its compatibility
decomposition, U+0049 U+0049 U+0049 should be used. The other two
are bad coding, just as using greek Iotas or
From: Roozbeh Pournader [EMAIL PROTECTED]
For those who were worried when is Microsoft going to implement good
Unicode support for Mac OS's IE, there is now an answer: Never. Read it
yourself:
http://news.com.com/2100-1045_3-1017126.html
It's a great news. It will force websites to stop
From: Michael (michka) Kaplan [EMAIL PROTECTED]
From: Philippe Verdy [EMAIL PROTECTED]
This is an equal opportunity forum intended for discussion of issues
relative to Unicode, an industrial consortium that includes (among many
others) the companies you are talking about. Excessive anti-ANYONE
From: Doug Ewell [EMAIL PROTECTED]
Philippe Verdy verdy_p at wanadoo dot fr wrote:
It's a great news. It will force websites to stop using Microsoft
specific features and caveats, and adopt the real standards.
...
If web sites start using the real standards, people will upgrade
From: Patrick Andries [EMAIL PROTECTED]
I'm looking for two mathematical characters.
2) An angle operator (combining mark ?) looking like this _| , where
a )
n| a) n occurrences of a
a means a )
n|
obviously a should all be
(combining mark ?) looking like this _| , where
a )
n| a) n occurrences of a
a means a )
n|
obviously ashould all be written on a single line.
And Philippe Verdy responded (after a long mathematical analysis
Excessive cross-posting to multiple newsgroups, forums and list
servers is considered bulk (and also opposed to the netiquette). As
this message is targetting a too large audience and out of topic, and
is also a commercial ad, I can say that bulk+unsollicitated makes it
fully qualifiable as SPAM.
From: Theodore H. Smith [EMAIL PROTECTED]
Excessive cross-posting to multiple newsgroups, forums and list
servers is considered bulk (and also opposed to the netiquette). As
this message is targetting a too large audience and out of topic, and
is also a commercial ad, I can say that
From: Michael Everson [EMAIL PROTECTED]
At 16:45 -0700 2003-06-20, Richard Cook wrote:
Of course, in pop e-print, nearly everything that can be done to a
character is done ... including Bold-Ital-Outline-Shadow ...
Hey, there's no reason only Latin typography should be filled with
From: Allen Haaheim [EMAIL PROTECTED]
Phillippe,
Sorry to reopen a (closed?) case. The below look like loose ends to me.
I thought it was closed too. Well I can reply, but I will just give my opinion
after reading translations to Japanese performed by other people, and
hearing their comments.
From: Christopher John Fynn [EMAIL PROTECTED]
So following normal Tibetan Dzongkha input and spelling rules
the relative ordering of these characters should be:
A. 0F71 (CCV=129)
B. 0F74 (CCV=132)
C. 0F72, 0F7A, 0F7B, 0F7C, 0F7D, 0F80 (CCV=130)
D. 0F7E, (CCV=0) 0F82, 0F83 (CCV=230)
On Monday, June 23, 2003 2:54 PM, Michael Everson [EMAIL PROTECTED] wrote:
It wouldn't be hard to provide a comparable descriptive paragraph
that began with an image of the Stars and Stripes, but I don't think
we'd want to encode the US flag as a character.
That would be a logo.
Most
On Monday, June 23, 2003 10:17 PM, Michael Everson [EMAIL PROTECTED] wrote:
There doesn't seem to be a NUT SYMBOL used to warn that products
contain nuts, though there are many, many references to Sainsbury's
(a British supermarket chain) labelling their peanuts Warning:
Contains Nuts.
What
On Tuesday, June 24, 2003 7:41 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Michael Everson wrote on 06/23/2003 07:54:13 AM:
Similarly, the fleur-de-lis is a
well-known named symbol which can be used to represent a number of
things.
In text? I've seen it on flags, on license plates, on
On Tuesday, June 24, 2003 6:30 PM, Rick McGowan [EMAIL PROTECTED] wrote:
U+2668 HOT SPRINGS is pleasant, but it's a lot less motivated -- to
my mind -- than the DO NOT LITTER SIGN.
Huh? The Hotspring sign appears in running text all the time -- in
Japanese travel brochures, for example.
On Wednesday, June 25, 2003 4:31 PM, Andrew C. West [EMAIL PROTECTED] wrote:
On Wed, 25 Jun 2003 15:05:26 +0400, Valeriy E. Ushakov wrote:
What I'm suggesting is that although cui 0F45, 0F74, 0F72 and
ciu 0F45, 0F72, 0F74 should be rendered identically, the logical
ordering of the codepoints
On Wednesday, June 25, 2003 6:11 PM, Michael Everson [EMAIL PROTECTED] wrote:
At 08:44 -0700 2003-06-25, Doug Ewell wrote:
If it's true that either the UTC or WG2 has formally approved the
character, for a future version of Unicode or a future amendment to
10646, then I don't see any
On Wednesday, June 25, 2003 6:13 PM, Mark Davis [EMAIL PROTECTED] wrote:
Michael Everson wrote:
[EMAIL PROTECTED] wrote:
Christopher John Fynn wrote:
Any suggestions as to how to create a standardized work around
for these incorrect values?
Propose new characters, and
From: Michael (michka) Kaplan [EMAIL PROTECTED]
From: Michael (michka) Kaplan [EMAIL PROTECTED]
From: Andrew C. West [EMAIL PROTECTED]
What I'm suggesting is that although cui 0F45, 0F74, 0F72
and ciu 0F45, 0F72, 0F74 should be rendered identically,
the logical ordering of the
On Wednesday, June 25, 2003 8:14 PM, Peter Lofting [EMAIL PROTECTED] wrote:
At 7:41 PM +0200 6/25/03, Philippe Verdy wrote:
If there are real distinct semantics that were abusively unified
by the canonicalization, the only safe way would be to create a
second character that would have
On Thursday, June 26, 2003 1:04 AM, Andrew C. West [EMAIL PROTECTED] wrote:
On Wed, 25 Jun 2003 13:41:27 -0700 (PDT), Kenneth Whistler wrote:
Peter asked:
How can things that are visually indistinguishable be lexically
different?
chat (en)
chat (fr)
And if Unicode
On Thursday, June 26, 2003 11:50 AM, Andrew C. West [EMAIL PROTECTED] wrote:
On Wed, 25 Jun 2003 21:58:28 -0700, Elisha Berns wrote:
Some weeks back there were a number of postings about software for
viewing Unicode Ranges in TrueType fonts and I had a few questions
about that. Most
On Thursday, June 26, 2003 2:26 PM, Philippe Verdy [EMAIL PROTECTED] wrote:
I forgot also the probably better function from the Uniscribe library, which processes
strings through a language-dependant shaping algorithm, and can determine appropriate
glyph substitution, or use custom composite
On Thursday, June 26, 2003 4:13 PM, Andrew C. West [EMAIL PROTECTED] wrote:
On Thu, 26 Jun 2003 14:26:13 +0200, Philippe Verdy wrote:
Isn't there a work-around with the following function (quote from
Microsoft MSDN):
(with the caveat that you first need to allocate and fill a Unicode
On Thursday, June 26, 2003 8:16 PM, Elisha Berns [EMAIL PROTECTED] wrote:
It would appear from your answer that even after implementing the
algorithm to search the Unicode block coverage of a font, the actual
comparison data, that is which blocks to compare and how many code
points, is totally
On Friday, June 27, 2003 3:54 AM, Kenneth Whistler [EMAIL PROTECTED] wrote:
John,
At 03:36 PM 6/26/2003, Kenneth Whistler wrote:
Why is making use of the existing behavior of existing characters
a groanable kludge, if it has the desired effect and makes
the required distinctions
When I just look at the history of combining classes, they did not exist in the first
Unicode standard, and they still don't exist in ISO10646 as well.
This was a technology developed by IBM and offered for free to the community to allow
a simplified management of encoded texts, and it has long
In order to implement a plain-text search algorithm, in a language neutral way that
would still work with all scripts, I am searching for advices on how this can be done
safely (notably for automated search engines), to allow searching for text matching
some basic encoding styles.
My first
On Friday, June 27, 2003 1:29 PM, John Cowan [EMAIL PROTECTED] wrote:
Michael Everson scripsit:
Change the character classes in Unicode 4.1, and they *might* decide
to freeze support at, say, Unicode 3.0.
Or they may simply opt to define their *OWN* normalization standard, distinct from
On Friday, June 27, 2003 3:36 PM, Jony Rosenne [EMAIL PROTECTED] wrote:
For Hebrew and Arabic, add a step: Find the root, remove prefixes,
suffixes and other grammatical artifacts and obtain the base form of
the word.
Removing common suffixes is a separate issue (this requires unification of
On Friday, June 27, 2003 3:23 PM, Karljürgen Feuerherm [EMAIL PROTECTED] wrote:
At 04:22 -0500 2003-06-27, [EMAIL PROTECTED] wrote:
Now, Q: I take it the combining classes are linked to the script,
rather than say to a dialect--e.g. one can't define BH as a separate
dialect from MH with its
On Friday, June 27, 2003 4:44 PM, Ben Dougall [EMAIL PROTECTED] wrote:
i'm a bit confused. i thought that this type of thing was already
pretty well covered by the various unicode resources? (i guess there's
a strong chance not, if you're asking this question).
I'm not discussing about how
On Friday, June 27, 2003 4:40 PM, John Cowan [EMAIL PROTECTED] wrote:
Not so. Sometimes stability is more important than correctness.
Very well answered. I don't see why we need to sacrifice stability when
correcting something. As the error is not in ISO10646, it is definitely not
reasonnable
On Friday, June 27, 2003 5:05 PM, Michael Everson [EMAIL PROTECTED] wrote:
At 10:40 -0400 2003-06-27, John Cowan wrote:
Karljürgen Feuerherm scripsit:
1. Everyone is more or less agreed that the present combining
class rules as they apply to BH contain mistakes. The clearly
On Friday, June 27, 2003 5:53 PM, Karljürgen Feuerherm [EMAIL PROTECTED] wrote:
And in any case this should NOT muck things up which aren't broken,
like MH.
Not breaking Modern Hebrew means not changing the combining classes
of the characters it uses.
Adding a distinct set for Traditional
On Friday, June 27, 2003 6:01 PM, Philippe Verdy [EMAIL PROTECTED]
wrote:
Given that XML will require normalization for texts identified as
being Unicode encoded (UTF-8 and others), couldn't a document be
labelled so that the normalization step be removed from the XML
processing, using a ISO
On Friday, June 27, 2003 10:28 PM, John Hudson [EMAIL PROTECTED] wrote:
I don't think it would break any modern Hebrew document, because it
is not in any way essential to modern Hebrew that the vowels have
fixed position combining classes as in Unicode. That is part of the
frustration: the
On Friday, June 27, 2003 10:29 PM, Rick McGowan [EMAIL PROTECTED]
wrote:
The Unicode Technical Committee has posted a new issue for public
review and comment. Details are on the following web page:
http://www.unicode.org/review/
Briefly, the new issue is:
Issue #11 Soft Dotted
On Saturday, June 28, 2003 1:15 AM, Kenneth Whistler [EMAIL PROTECTED] wrote:
Philippe Verdy said:
I understand the frustration: if Unicode had not attempted to define
combining classes, which were not necessary to Unicode, all
existing combining characters would have been given a CC=0
On Monday, June 30, 2003 1:58 PM, Pim Blokland [EMAIL PROTECTED] wrote:
Philippe Verdy schreef:
Interesting issue for the Latin Small ij Ligature (U+0133):
Normally the Soft_Dotted issupposed to make disappear one dot when
there's and additional diacritic above, but many applications may
On Monday, June 30, 2003 1:33 PM, Kent Karlsson [EMAIL PROTECTED] wrote:
Or would this require using a diaeresis instead centered above the
digraph?
Probably. But are there any examples of this in use (ever, not
necessarily Unicode encoded, or at all digitally encoded)? If that kind
of
On Monday, June 30, 2003 9:13 PM, James H. Cloos Jr. [EMAIL PROTECTED] wrote:
So if you want two dots and an acute use ij, U+0308, U+0301:
Of course a given fonts diaeresis will often not line up with the
stems of its ij, and a custom one should be used instead. Or
features and/or ligs as
On Tuesday, July 01, 2003 1:55 PM, Kent Karlsson [EMAIL PROTECTED] wrote:
My feeling about the proposed Public Review document should
exclude the ij ligature, waiting for the decision about the new
dotless-ij ligature approved in the first rounds by UTC and
waiting for approval by ISO
1 - 100 of 2449 matches
Mail list logo