Edward Cherlin wrote on 2002-01-10 21:28 UTC:
> Unicode language tags are heavily deprecated. Language tagging is 
> markup, and there is no point pretending you have plain text when you 
> mark languages. 

That's debatable, and the community around a language called
"mathematics" is just heading in the exact opposite direction for good
reasons, trying to flee the horrid world of XML and other markup
languages and getting 90% of what they need into plaintext, which will
for the foreseeable future remain the only common denominator data
format that can transcent individual applications well.

I see markup (in the sense of SGML et al.) as information that is highly
specific to a document type, as in "this shall be field 47 of a patent
application (inventor's mail address)". Everything more generic than
that (such as what language is this, perhaps even generic forms of
emphasis such as <EM>...</EM>) probabaly have indeed a fair place in a
plaintext standard such as Unicode, because you definitely want langauge
tags (and simple generic emphasis) to survive in a cut&paste from a
patent application XML into a web page or company memo wordprocessing
file. Otherwise, all markup languages would need the exact same language
tagging (we had that already with character sets, remember? that's how
we got to Unicode ...).

> If you want tagging in plain text, use a standard. As far as I can 
> tell, the best available standard for such things is XML, which 
> defines Unicode as its preferred character set.

I think, this is rapidly becoming the old-fashioned and technically
difficult to justify view these days, as it seems clear now that no
single standard markup languages (in particular not XML with its baroque
syntax) has the potential for becoming as ubiquitous as some have ho^Hyped
5 years ago. Plaintext remains a far more powerful concept and XML is
mostly a markup mechanism designed to overcome deficiencies in ASCII
that appears rather clumsy in a pure Unicode plaintext world.

Discuss.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to