On Thursday 10 January 2002 04:04 pm, you wrote: > On Thu, Jan 10, 2002 at 01:28:53PM -0800, Edward Cherlin wrote: > > > Hmm. Looks like Unicode language tags are a much better > > > solution. > > > > Unicode language tags are heavily deprecated. Language tagging is > > markup, and there is no point pretending you have plain text when > > you mark languages. > > Heavily deprecated? They were only added to the main body of the > standard in Unicode 3.1, which isn't a year old.
http://groups.yahoo.com/group/unicode/message/3845 :From: Doug Ewell <[EMAIL PROTECTED]> :Date: Wed Sep 6, 2000 2:05 pm :Subject: Re: Plane 14 redux : :Kenneth Whistler <[EMAIL PROTECTED]> wrote: ... :> Most :> of us, including those of use culpable in the definition of the :> tag characters (which John Cowan pointed out were defined to head :> off a worse threat to UTF-8) would prefer not to see them in :> wide use, but rather the use of standard tagging mechanisms like :> XML or HTML. : :Wow. You too. : :I honestly had no idea that the use of Plane 14 language tags, :defined as they are in a Unicode Technical Report, were so strongly :deprecated by everyone "in the know" about Unicode, including their :own creators. I had read UTF #7 at face value, as describing an :optional mechanism that might help with certain processes but which :we were under no obligation to use, but now it appears that Plane 14 :language tags have the RFC 1815 nature ("Here's something you can :use, but for God's sake, please don't use it"). ... > > If you want tagging in plain text, use a standard. As far as I > > can tell, the best available standard for such things is XML, > > which defines Unicode as its preferred character set. > > The reason these characters *exist* is for specifying the language > where a markup language like XML isn't an option. That's the case > with Ogg tags. I don't understand why markup is not an option. > > I see no reason to encode language in Ogg tags. Users should be > > able to choose a Unicode fontset that suits their needs for > > displaying all languages. > > The entire discussion is about the ambiguities that prevent > displaying a character in its native form without extra > information. If your needs include "use font A for language A, and > font B for language B", and languages A and B share codepoints, you > need language tagging in some form; no fontset will be able to > figure it out. Feel free to show that no people exist who want to > do that. Certainly some people want to. I'm arguing that they don't need to. Anyway, give us an example. Either a message in one language that cannot be displayed correctly from the plain text, or a message in more than one language where rendering in the user's preferred font loses information for that user. -- Edward Cherlin [EMAIL PROTECTED] Does your Web site work? -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
