Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-15 Thread J. Roeleveld
On Thursday 03 December 2009 20:20:03 fe...@crowfix.com wrote: I have a project which requires normalizing names, and by that, I mean converting to lower case etc, whatever eliminates redundancies. I know Unicode has a different normalize meaning, but for my purposes, that has already been

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-05 Thread daid kahl
I have a project which requires normalizing names, and by that, I mean converting to lower case etc, whatever eliminates redundancies.  I know Unicode has a different normalize meaning, but for my purposes, that has already been done.  Maybe I should call it standardization or make up a new

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-05 Thread felix
On Sun, Dec 06, 2009 at 10:58:59AM +0900, daid kahl wrote: I'm curious about your handling of Japanese, just because I'm living outside Tokyo these days. My grasp on Japanese is basically rubbish, but I can at least claim to know a thing or two. Our handling is simple -- we don't yet. I

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-05 Thread daid kahl
Our handling is simple -- we don't yet. I don't know how to handle things like that, or the previous example of Copenhagen in different languages. Look at Naples -- that's not what Italins call it. Venice is really bad -- no idea how English got it so mangled. Speaking of Japanese, their

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-05 Thread daid kahl
such as (I am guessing now) saw-umm-bee-yaw-koo.  To write Tokyo in the proper furigana is probably something like toh-o-kee-yoh-o. Oh, I should mention that this is in writing correct. But the yo is a subscript, so it's also a modifier, so the ki part isn't pronounced, it's modified into a

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-05 Thread felix
On Sun, Dec 06, 2009 at 11:45:43AM +0900, daid kahl wrote: Well, I don't think n is really a syllable. It's a sound, and it's the only part of the syllabary in Japanese that doesn't have a vowel. I'm not really convinced this is a syllable in reality. It's certainly a syllable in their

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-04 Thread Patrick Holthaus
Hey! So do people type in Busingen different ways depending on how they feel, do some people always leave off the umlaut, do some always use it? You cannot simply leave the umlaut out since it is considered as a separate letter for itself. You cannot choose whether to write an ö or an o.

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-04 Thread felix
On Fri, Dec 04, 2009 at 10:17:30AM +0100, Patrick Holthaus wrote: You cannot simply leave the umlaut out since it is considered as a separate letter for itself. You cannot choose whether to write an ? or an o. Like Renat said, there are words that completely change their meaning when

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-04 Thread Volker Armin Hemmann
On Freitag 04 Dezember 2009, fe...@crowfix.com wrote: If enough Europeans are in the habit of taking shortcuts and skipping umlauts and accents and cedilla and tildes, we don't. Because skipping Umlaut, accentco creates a completly new word. Probably one that is already there. Munster is a

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-04 Thread Alan McKinnon
On Friday 04 December 2009 15:42:56 Volker Armin Hemmann wrote: On Freitag 04 Dezember 2009, fe...@crowfix.com wrote: If enough Europeans are in the habit of taking shortcuts and skipping umlauts and accents and cedilla and tildes, we don't. Because skipping Umlaut, accentco creates a

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-04 Thread Neil Bothwick
On Fri, 4 Dec 2009 22:50:52 +0200, Alan McKinnon wrote: Three consecutive e's looks weird Are you calling my laptop weird? ;-) -- Neil Bothwick THE BORG: Calm, Cool and Collective... signature.asc Description: PGP signature

[gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-03 Thread felix
I have a project which requires normalizing names, and by that, I mean converting to lower case etc, whatever eliminates redundancies. I know Unicode has a different normalize meaning, but for my purposes, that has already been done. Maybe I should call it standardization or make up a new

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-03 Thread Renat Golubchyk
Hi! On Thu, 3 Dec 2009 11:20:03 -0800 fe...@crowfix.com wrote: In Germany is a district Busingen, with an umlauted 'u'. Is it reasonable to consider it the same word whether with or without the unlauted u? No. For many words it would be ok, but not for all. For example, drucken means to

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-03 Thread felix
On Thu, Dec 03, 2009 at 08:50:08PM +0100, Renat Golubchyk wrote: I'd suggest you use a unicode library. BTW, what about cyrillic letters or other alphabets? Those may have nothing to do with ASCII. Or is your project restricted to latin letters? The data is already in normalized Unicode. My

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-03 Thread Renat Golubchyk
On Thu, 3 Dec 2009 12:07:26 -0800 fe...@crowfix.com wrote: So do people type in Busingen different ways depending on how they feel, do some people always leave off the umlaut, do some always use it? If you want to leave of the umlaut you have to be absolutely sure that there exists no other

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-03 Thread Volker Armin Hemmann
On Donnerstag 03 Dezember 2009, Renat Golubchyk wrote: Hi! On Thu, 3 Dec 2009 11:20:03 -0800 fe...@crowfix.com wrote: In Germany is a district Busingen, with an umlauted 'u'. Is it reasonable to consider it the same word whether with or without the unlauted u? No. For many words

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-03 Thread Alan McKinnon
On Friday 04 December 2009 00:07:33 Volker Armin Hemmann wrote: On Donnerstag 03 Dezember 2009, Renat Golubchyk wrote: Hi! On Thu, 3 Dec 2009 11:20:03 -0800 fe...@crowfix.com wrote: In Germany is a district Busingen, with an umlauted 'u'. Is it reasonable to consider it the same

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-03 Thread Francisco Ares
On Thu, Dec 3, 2009 at 6:29 PM, Renat Golubchyk ragerm...@gmx.net wrote: On Thu, 3 Dec 2009 12:07:26 -0800 fe...@crowfix.com wrote: So do people type in Busingen different ways depending on how they feel, do some people always leave off the umlaut, do some always use it? If you want to

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-03 Thread Arttu V.
On 12/3/09, fe...@crowfix.com fe...@crowfix.com wrote: I have a project which requires normalizing names, and by that, I mean converting to lower case etc, whatever eliminates redundancies. I assume you have already removed the language problem from the equation? I.e., the fact that København,

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-03 Thread felix
On Thu, Dec 03, 2009 at 08:32:45PM -0200, Francisco Ares wrote: What about a set of dictionaries? And also a library for mistyped word search? Way too much effort for this. Nice idea, might even be fun, but it's just trying to avoid the common things, and I mainly wondered about how often

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-03 Thread felix
On Fri, Dec 04, 2009 at 12:38:34AM +0200, Arttu V. wrote: I assume you have already removed the language problem from the equation? I.e., the fact that K?benhavn, Copenhague, K??penhamina and Copenhagen all mean the same place, just in different European languages (Danish, Spanish, Finnish

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-03 Thread Volker Armin Hemmann
look at my name, ok? Just dropping the Umlaut is wrong. No if, but, maybe. It is wrong. Error. Mistake. Fail. If you can not enter ä, ö or ü, you must transform them to ae, oe or ue.

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-03 Thread Alan McKinnon
On Friday 04 December 2009 02:03:23 Volker Armin Hemmann wrote: look at my name, ok? Just dropping the Umlaut is wrong. No if, but, maybe. It is wrong. Error. Mistake. Fail. If you can not enter ä, ö or ü, you must transform them to ae, oe or ue. Your name shows here in 7-bit ASCII:

Re: [gentoo-user] [OT] Need advice from people who use non-ascii all day long

2009-12-03 Thread felix
On Fri, Dec 04, 2009 at 01:03:23AM +0100, Volker Armin Hemmann wrote: look at my name, ok? Just dropping the Umlaut is wrong. No if, but, maybe. It is wrong. Error. Mistake. Fail. If you can not enter ?, ? or ?, you must transform them to ae, oe or ue. I'd like to find a program which