On Thu, Sep 13, 2001 at 12:40:30AM -0700, Edward Cherlin wrote:
: For example,
:
: 1984 (Nineteen Eighty Four)
: 1066 and all that (Ten Sixty Six)
: 3001 (Three Thousand One)
: 2050 (Twenty Fifty)
: 2010 (Twenty Ten)
: 2001, A Space Odyssey (Two Thousand One)
You're missing the and from 3001
English and several other languages have dozens of collations. Compare telephone
books, library catalogs, book indexes (sic), and other sorted data. Knuth vol. 3
Sorting and Searching gives an example of a set of library sorting rules that runs to
more than a page, and suggests programming it
say is true,
I could never be the right kind of girl for you,
I could never be your woman
- White Town
--- Original Message ---
$B:9=P?M(B: Edward Cherlin [EMAIL PROTECTED];
$B08@h(B: [EMAIL PROTECTED];
Cc:
$BF|;~(B: 01/09/13 7:40
$B7oL>(B: Collation (was RE: [OT] o-c
- Original Message -
From: Edward Cherlin [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, September 13, 2001 3:40 AM
Subject: Collation (was RE: [OT] o-circumflex)
English and several other languages have dozens of collations. Compare
telephone books, library catalogs, book
[EMAIL PROTECTED]
To: Edward Cherlin [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Thursday, September 13, 2001 8:35 AM
Subject: Re: Collation (was RE: [OT] o-circumflex)
Java's collation class has a rule-based collator that is in effect
programmable using a little language. Here is how an example
On Mon, 10 Sep 2001, Mark Davis wrote:
A ZWNJ will break ligatures and cursive connections. While probably safe in
Danish or Dutch, it is unclear to me that that is safe in all languages
where this situation occurs. There are diagraphs in Urdu, for example. While
I don't know their sorting
- Original Message -
From: Keld Jørn Simonsen [EMAIL PROTECTED]
To: Stefan Persson [EMAIL PROTECTED]
Cc: Mark Davis [EMAIL PROTECTED]; Michael (michka) Kaplan
[EMAIL PROTECTED]; Keld Jørn Simonsen [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: den 10 september 2001 22:12
Subject: Re: [OT] o
- Original Message -
From: Lars Marius Garshol [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: den 10 september 2001 22:45
Subject: Re: [OT] o-circumflex
I am not sure of this, but I think 'å' is a relatively modern
invention, and that it was originally written only as 'aa'.
FYI
];
[EMAIL PROTECTED]
Sent: den 10 september 2001 22:12
Subject: Re: [OT] o-circumflex
Where is this done for swedish? I have read both the TN and the SIS
standard, and I dont believe these say something on sorting
ü according to either German or Dutch sounds. Rolf Gavare does not
say
* Lars Marius Garshol
|
| I am not sure of this, but I think 'å' is a relatively modern
| invention, and that it was originally written only as 'aa'.
* Stefan Persson
|
| FYI, a relatively modern invention means that is has been used
| since the Medieval (in Swedish).
I don't think that is
John Cowan wrote:
None of which is as weird as Leghorn for Livorno (Italy).
It's as weird as some Italian names for German cities: Aquisgrana for
Aachen, Augusta for Augsburg, Magonza for Mainz, Monaco (di Baviera) for
München.
_ Marco
Carl W. Brown wrote:
In Arabic do you include vowels or not?
Yes, and also consonants sometimes...
Traditional Arabic dictionary sorting uses the three-letter root (radical)
of a word as the primary key. So, madrasa (school) would be under d
(because its radical is d-r-s = to learn), ignoring
Asmus Freytag wrote:
But if you do this, all compound words starting with data
and continuing
with another word starting with a will be sorted incorrectly!
To achieve this effect, you would have to mark which AAs are
A-Rings and which ones are accidental adjacencies. In Danish
one can
At 18:04 +0200 2001-09-09, Stefan Persson wrote:
well, the official spelling of the town is Aalborg.
In Sweden it has always been written Ålborg.
At one stage, in both countries, it was written Álaborg, I suspect,
as it is in Iceland today.
--
Michael Everson
At 18:10 -0400 2001-09-09, John Cowan wrote:
Keld Jørn Simonsen scripsit:
Yes, foreigners call our cities many strange things:-)
København is called Köpenhamn, Copenhagen, Kobenhagen, Copenhague,
and many more.
In Iceland it is Kaupmannahöfn, I believe. In unadorned English that
would
On Mon, Sep 10, 2001 at 11:09:28AM +0200, Marco Cimarosti wrote:
Asmus Freytag wrote:
But if you do this, all compound words starting with data
and continuing
with another word starting with a will be sorted incorrectly!
To achieve this effect, you would have to mark which AAs are
From: Keld Jørn Simonsen [EMAIL PROTECTED]
Real-life sorts, like MS Windows sorting or Linux sorting, actually
adheres
to these Danish rules, once you have set up your machine for Danish.
And this is the *true* answer to the whole mess of attempting *multilingual*
sorts -- once the user
On Mon, Sep 10, 2001 at 11:09:28AM +0200, Marco Cimarosti wrote:
Asmus Freytag wrote:
But if you do this, all compound words starting with data
and continuing
with another word starting with a will be sorted incorrectly!
To achieve this effect, you would have to mark which
Mon, 10 Sep 2001 10:47:48 +0200, Marco Cimarosti [EMAIL PROTECTED] pisze:
It's as weird as some Italian names for German cities: Aquisgrana
for Aachen, Augusta for Augsburg, Magonza for Mainz, Monaco (di
Baviera) for Mnchen.
Interesting that Polish names of these cities are more like Italian
On Mon, Sep 10, 2001 at 03:58:05PM +0200, Marco Cimarosti wrote:
On Mon, Sep 10, 2001 at 11:09:28AM +0200, Marco Cimarosti wrote:
Asmus Freytag wrote:
But if you do this, all compound words starting with data
and continuing
with another word starting with a will be sorted
From: Mark Davis [EMAIL PROTECTED]
Michael, that isn't the point. There is a problem even when you stick to
one
language.
That is, there are situations where two letters in a language, e.g. ch
in
Slovak, are normally sorted as one. However, in some exceptional
circumstances those letters
-
From: Michael (michka) Kaplan [EMAIL PROTECTED]
To: Keld Jørn Simonsen [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Monday, September 10, 2001 5:48 AM
Subject: Re: [OT] o-circumflex
From: Keld Jørn Simonsen [EMAIL PROTECTED]
Real-life sorts, like MS Windows sorting or Linux sorting, actually
On Mon, 10 Sep 2001 16:42:45 +0200, Keld Jørn Simonsen wrote:
But maybe you are driving for a yet more complex sorting, one that can sort
according to multiple rules? Beijing should then not be sorted as Beÿing?
I haven't followed this discussion from the beginning, so apologies if
I'm missing
.
Mark
—
Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο
πάντα — Όμήρου Μαργίτῃ
[http://www.macchiato.com]
- Original Message -
From: John Wilcock [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, September 10, 2001 8:39 AM
Subject: Re: [OT] o-circumflex
On Mon, 10 Sep 2001 16:42:45
John Wilcock wrote:
I haven't followed this discussion from the beginning, so apologies if
I'm missing the point, but it seems to me that the Beijing case in
Dutch is no different from the ekstraarbejde case in Danish - a SHY or
ZWNJ is all that is needed to stop Beijing sorting with Bey.
u,
I could never be your woman
- White Town
--- Original Message ---
$B:9=P?M(B: Marcin 'Qrczak' Kowalczyk [EMAIL PROTECTED];
$B08@h(B: [EMAIL PROTECTED];
Cc:
$BF|;~(B: 01/09/10 14:02
$B7oL>(B: Re: [OT] o-circumflex
Mon, 10 Sep 2001 10:47:48 +0200, Marco Cimarosti
- Original Message -
From: Marco Cimarosti [EMAIL PROTECTED]
To: 'John Wilcock' [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: den 10 september 2001 18:35
Subject: RE: [OT] o-circumflex
John Wilcock wrote:
I haven't followed this discussion from the beginning, so apologies if
I'm
Stefan Persson wrote:
I thought ij sorted after z?
Not in Dutch: as far as I have seen it sorts the same as y. In fact, in
the telephone directory many people who had an y in their surname listed
near people who had the same surname spelled with ij (e.g. Meyer and
Meijer).
(Anyway, next time
) Kaplan [EMAIL PROTECTED]; Keld Jørn Simonsen
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: den 10 september 2001 17:27
Subject: Re: [OT] o-circumflex
Michael, that isn't the point. There is a problem even when you stick to
one
language.
That is, there are situations where two letters in a language
On Mon, 10 Sep 2001, [ISO-2022-JP] $B$F$s$I$$j$e$$8(B wrote:
If they can't agree on the pronunciation for these cities, can they
agree on the Hanzi for them? What ARE the Hanzi for these cities,
anyway??
Are you asking for the names of cities in Chinese? Copenhagen is
ge1ben3ha1gen1
Where is this done for swedish? I have read both the TN and the SIS
standard, and I dont believe these say something on sorting
ü according to either German or Dutch sounds. Rolf Gavare does not
say something along this either, as far as I can remember.
Kind regards
keld
On Mon, Sep 10, 2001
hite Town
--- Original Message ---
$B:9=P?M(B: Stefan Persson [EMAIL PROTECTED];
$B08@h(B: Mark Davis [EMAIL PROTECTED];"Michael (michka) Kaplan"
[EMAIL PROTECTED];Keld J?n Simonsen [EMAIL PROTECTED];[EMAIL PROTECTED];
Cc:
$BF|;~(B: 01/09/10 17:09
$B7oL>(B: Re:
* Carl W. Brown
|
| You are quite correct that is why Unicode support differing
| collation strengths. Some times you only care about the actual
| letters without diacritics. But even then letters are locale
| sensitive. For example the Danish alphabet starts with an A and
| ends it with A
* Francesco Zappa Nardelli
|
| I was in Aalborg fifteen days ago, and I have seen its name written
| both as Ålborg and as Aalborg. Where does Aalborg appear in a list
| of towns?
At the end.
In both Danish and Norwegian 'aa' and 'å' are considered equivalent.
I am not sure of this, but I
* Jonathan Rosenne
|
| This is not always the right thing to do. For example, with personal
| names the person involved may decide whether he prefers the old (AA)
| spelling or the new Å. In any case they are equivalent.
This is true, but this is nothing particular to the aa/å distinction.
Many
* Keld Jørn Simonsen
|
| Yes, foreigners call our cities many strange things:-) København is
| called Köpenhamn, Copenhagen, Kobenhagen, Copenhague, and many more.
* Michael Everson
|
| In Iceland it is Kaupmannahöfn, I believe. In unadorned English that
| would be something like
* Marco Cimarosti
|
| One of these cases could be the word dataarkiv, which I found in a Danish
| web page
| (http://www.riksarkivet.no/nordiskarknytt/98-nr4/institusjonen.html).
Uh, no, you found it in a Norwegian web page. The word is the same in
Danish, though.
| Order B:
|
On 09/10/2001 07:48:05 AM Michael \(michka\) Kaplan wrote:
(can't believe this thread is still going on!)
I just wanted to know about how Francophones perceive certain graphemes,
and I got that answer a long time ago.
Peter
It's as weird as some Italian names for German cities: Aquisgrana
for Aachen, Augusta for Augsburg, Magonza for Mainz, Monaco (di
Baviera) for München.
MK Interesting that Polish names of these cities are more like Italian
MK than German: Akwizgran, Augsburg, Moguncja, Monachium.
Because
To: 'Stefan Persson'; 'John Wilcock'; [EMAIL PROTECTED]
Subject: RE: [OT] o-circumflex
Stefan Persson wrote:
I thought ij sorted after z?
Not in Dutch: as far as I have seen it sorts the same as y. In fact, in
the telephone directory many people who had an y in their surname listed
near people
your woman
- White Town
--- Original Message ---
$B:9=P?M(B: Thomas Chan [EMAIL PROTECTED];
$B08@h(B: [EMAIL PROTECTED];
Cc:
$BF|;~(B: 01/09/10 19:59
$B7oL>(B: Re: [OT] o-circumflex
On Mon, 10 Sep 2001, [ISO-2022-JP] $B$F$s$I$&$j$e$&$8(B wrote:
If t
Wy OT by now...
AAARRRGGHHH
I give up!
I was hoping that there is SOME system that would give these cities UNIQUE names...
postal codes???
Ain't reality a bitch?
What you're looking for doesn't exist in the world of natural language
names -- it can only exist in artificially
David,
I also don't know if the other countries have academies, but my
understanding is Latin American countries haven't accepted the modern
sort. Having said that, there is a lot of software that does not
implement the traditional sort, so acceptance is moot.
(The reason the Real Academia
On Sat, Sep 08, 2001 at 06:38:57PM -0700, Carl W. Brown wrote:
Asmus,
If you are entering Danish city names then enter it as Ålborg. You should
only use Aalborg where the font does not support Å. For matching logic you
can equate Å to Aa then the issue of compound words goes away.
well,
- Original Message -
From: Keld Jørn Simonsen [EMAIL PROTECTED]
To: Carl W. Brown [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: den 9 september 2001 14:21
Subject: Re: [OT] o-circumflex
On Sat, Sep 08, 2001 at 06:38:57PM -0700, Carl W. Brown wrote:
Asmus,
If you are entering
2:15 AM
Subject: Re: [OT] o-circumflex/Spanish sorting
David,
I also don't know if the other countries have academies, but my
understanding is Latin American countries haven't accepted the modern
sort. Having said that, there is a lot of software that does not
implement the traditional sort
On Sun, Sep 09, 2001 at 06:04:30PM +0200, Stefan Persson wrote:
- Original Message -
From: Keld Jørn Simonsen [EMAIL PROTECTED]
To: Carl W. Brown [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: den 9 september 2001 14:21
Subject: Re: [OT] o-circumflex
On Sat, Sep 08, 2001 at 06
Keld Jørn Simonsen scripsit:
Yes, foreigners call our cities many strange things:-)
København is called Köpenhamn, Copenhagen, Kobenhagen, Copenhague,
and many more. Helsingør is called Elsinore.
None of which is as weird as Leghorn for Livorno (Italy).
--
John Cowan
In a message dated 2001-09-07 17:19:49 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
You are quite correct that is why Unicode support differing collation
strengths. Some times you only care about the actual letters without
diacritics. But even then letters are locale sensitive. For
Doug,
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of [EMAIL PROTECTED]
Sent: Friday, September 07, 2001 10:52 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [OT] o-circumflex
In a message dated 2001-09-07 17:19:49 Pacific Daylight
Hello.
For example the Danish alphabet starts with an A and ends it with A
ring above. A Dane would look for Alborg near the end of a list of
towns.
I was in Aalborg fifteen days ago, and I have seen its name written
both as Ålborg and as Aalborg. Where does Aalborg appear in a list of
At 09:04 PM 9/7/01 -0700, Mark Davis wrote:
I disagree. What you want is a merged database field. See
http://www.macchiato.com/slides/icu_collation.ppt
Mark
Mark,
David took the remainder of our discussion off the alias. I won't repeat it
here, just to note that we've agreed that merged
~/icuhtml/design/searchproposal
.html).
—
Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο
πάντα — Όμήρου Μαργίτῃ
[http://www.macchiato.com]
- Original Message -
From: Francesco Zappa Nardelli [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, September 08, 2001 10:51 AM
Subject: Re: [OT] o
In a message dated 2001-09-08 12:00:43 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
I know the Real Academia Española decided to do away with ch and ll in
1994, but do you know if the other Spanish speaking countries'
corresponding
academies done the same?
I have no idea. I don't
At 02:45 PM 9/8/01 -0700, Mark Davis wrote:
If you use a Danish tailoring of the UCA that equates Å and AA (at least at
a primary and secondary level), then they will sort the same way. A string
search that uses the same tailoring will also find Ålborg when given
Aalborg (and vice versa).
But if
PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Asmus Freytag
Sent: Saturday, September 08, 2001 5:56 PM
To: Mark Davis; [EMAIL PROTECTED]; Francesco Zappa Nardelli
Subject: Re: [OT] o-circumflex
At 02:45 PM 9/8/01 -0700, Mark Davis wrote:
If you use a Danish tailoring of the UCA that equates
Of Carl W. Brown
Sent: Sunday, September 09, 2001 4:39 AM
To: [EMAIL PROTECTED]
Subject: RE: [OT] o-circumflex
Asmus,
This discussion reminds me of my ill fated efforts to produce a manageable
set of rules to do automatic title casing starting with French text. It
would have required either
I would say it is a variant of o we just called it... o with a circumflex
accent (o avec un accent circonflex). The difference between o and ô
is normally audible (for a French speaker). The relationship is the same
than with any other letter which sometimes have accents (e.g. a and à,
e and è,
On Thu, 6 Sep 2001, Ayers, Mike wrote:
From: David Starner [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 06, 2001 01:40 PM
On Thu, Sep 06, 2001 at 04:03:07PM +0200, Thierry Sourbier wrote:
The only little thing to know about French and diacritical
mark is that when
I believe that there is an established sort order in English, which
is to sort without regard to diacritics, or else we'd never find the
words!
In English (American English more than British English), diacritics are
considered optional, and it is common to see naїve written naive, San
José
There is also no word pair separated only by the I/J distinction (in English), right?
rubyrb$B$8$e$&$$$C$A$c$s(B/rbrp(/rprtJuuitchan/rtrp)/rp/ruby
Well, I guess what you say is true,
I could never be the right kind of girl for you,
I could never be your woman
- White Town
rubyrb$B$8$e$&$$$C$A$c$s(B/rbrp(/rprtJuuitchan/rtrp)/rp/ruby
Well, I guess what you say is true,
I could never be the right kind of girl for you,
I could never be your woman
- White Town
Who'd be a lexicographer?
$B;d!)(B
Mike.
From: J M Sykes [mailto:[EMAIL PROTECTED]]
Sent: Friday, September 07, 2001 07:50 AM
The classic example is 'resume' and 'résumé'. These are, by
now, two quite
distinct words, and the fact that there is no 'established'
order is shown
I spell both resume and have never been
]
Sent: Thursday, September 06, 2001 5:12 PM
Subject: RE: [OT] o-circumflex
From: David Starner [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 06, 2001 01:40 PM
On Thu, Sep 06, 2001 at 04:03:07PM +0200, Thierry Sourbier wrote:
The only little thing to know about French and diacritical
There is also no word pair separated only by the I/J
distinction (in English), right?
iamb - as in iambic pentamater
jamb - as in a door jamb
From: David Gallardo [mailto:[EMAIL PROTECTED]]
Sent: Friday, September 07, 2001 10:07 AM
As a practical matter, you need to take the diacritics into
account when
sorting, even in English where they (may or may not) have linguistic
significance, otherwise you'll get nondeterministic
From: David Gallardo [EMAIL PROTECTED]
As a practical matter, you need to take the diacritics into account when
sorting, even in English where they (may or may not) have linguistic
significance, otherwise you'll get nondeterministic behaviour. In other
words, résumé and resume should fall
At 11:50 AM 9/7/01 -0500, Ayers, Mike wrote:
Words with the
same spelling and different pronunciation are uncommon but exist in English,
the classic example being read and its own past tense.
Actually, this is a bit more common than you think, since the pronunciation
of vowels in English
At 01:06 PM 9/7/01 -0400, David Gallardo wrote:
As a practical matter, you need to take the diacritics into account when
sorting, even in English where they (may or may not) have linguistic
significance, otherwise you'll get nondeterministic behaviour. In other
words, résumé and resume should
there are other considerations.
Carl
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Asmus Freytag
Sent: Friday, September 07, 2001 11:51 AM
To: David Gallardo; Ayers, Mike; 'David Starner'; [EMAIL PROTECTED]
Subject: Re: [OT] o-circumflex
At 01:06 PM
, September 07, 2001 11:52
Subject: RE: [OT] o-circumflex
At 11:50 AM 9/7/01 -0500, Ayers, Mike wrote:
Words with the
same spelling and different pronunciation are uncommon but exist in
English,
the classic example being read and its own past tense.
Actually, this is a bit more common than you think
[EMAIL PROTECTED]; Ayers, Mike
[EMAIL PROTECTED]; 'David Starner' [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Friday, September 07, 2001 11:50
Subject: Re: [OT] o-circumflex
At 01:06 PM 9/7/01 -0400, David Gallardo wrote:
As a practical matter, you need to take the diacritics into account when
How do Francophones view the o-circumflex ô in relation to the letter o? Is it a distinct grapheme, or is it considered a variant of o?
- Peter
---
Peter Constable
Non-Roman Script Initiative, SIL International
7500 W.
]
Sent: Thursday, September 06, 2001 3:08 PM
Subject: [OT] o-circumflex
How do Francophones view the o-circumflex ô in relation to the letter o?
Is it a distinct grapheme, or is it considered a variant of o?
- Peter
On Thu, Sep 06, 2001 at 04:03:07PM +0200, Thierry Sourbier wrote:
The only little thing to know about French and diacritical mark is that when
doing a sort diacritical mark are evaluated from right to left. (e.g.
cote côte coté vs the English order cote coté côte ).
I'm not sure there
My impression is that at least in U.S. states, which are more heavily
populated by native Spanish speakers, the one diacritic, which is
frequently viewed by English speakers as non-optional to differentiate
two words (specifically proper names) is the tilde as used for the
eñe. There is a college
David Starner wrote:
Yes, but I mean for cote, côte, and coté. How would you
sort those three in English? I'd probably sort it by some
extra-lingual information: i.e. page number, date of birth
or the like.
Store them as UTF-8, do a DOS sort, and call the results
the new World order?
M(B: "Ayers, Mike" [EMAIL PROTECTED];
$B08@h(B: 'David Starner' [EMAIL PROTECTED];[EMAIL PROTECTED];
Cc:
$BF|;~(B: 01/09/06 21:12
$B7oL>(B: RE: [OT] o-circumflex
From: David Starner [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 06, 2001 01:40 PM
On Thu, Sep 06, 2
78 matches
Mail list logo