On Sat, 11 Dec 2010 15:24:52 -0800, Rick Gordon <[email protected]>
wrote:
>Understandably, sensitivities concerning ethnocentricity can be
>triggered within such a discussion, but how about:
>1) A definition which will work among the greatest majority of
>linguistic cases -- languages that have a commonly accepted range of
>word delimiters (which I think might include all European and Semitic
>languages, or other languages written with Roman/Cyrillic/Greek/Semitic
>alphabets), and make that a default, which might be finessed with an
>explicit language tag, which might modify the default delimiter list.
An explicit declaration of the language would, in my opinion, be a good
idea on general principles. Admittedly, I don't do it on my own pages,
but I *can* see where an explicit language setting might make it easier
for rendering engines or other programs that might want to process text
in language-relevant ways to make appropriate assumptions regarding how
the source language data should be interpreted.
>2) Allow for the use of specific word-break and work-inclusion tags that
>would work in any lingusitic context, or where an override is required.
For this, Unicode points 8203 and 8204, the zero-width space and the
zero-width non-joiner, might be possible candidates. Add these to the
list of allowed "word" delimiters, and existing algorithms need not be
significantly modified.
In any case, the problem becomes getting people to use such tools
consistently; in the messages that Gabriele quoted, Thai was explicitly
mentioned, and it is simply not "natural" for a native Thai speaker/
writer to think in terms of breaking up his/her writing into discrete
words, even with something like the zwnj. I don't believe that Thai is
unique in that respect; the same may be true of many Southeast Asian,
East Asian, and *nesian languages that have not adopted western or
Indian scripts.
--
Jeff Zeitlin, Editor
Freelance Traveller
The Electronic Fan-Supported
Traveller® Fanzine and Resource
[email protected]
http://www.freelancetraveller.com
http://come.to/freelancetraveller
http://freelancetraveller.downport.com/
®Traveller is a registered trademark of
Far Future Enterprises, 1977-2009. Use of
the trademark in this notice and in the
referenced materials is not intended to
infringe or devalue the trademark.
Freelance Traveller extends its thanks to the following
enterprises for hosting services:
CyberNET Web Hosting (http://www.cyberwebhosting.net)
The Traveller Downport (http://www.downport.com)
______________________________________________________________________
css-discuss [[email protected]]
http://www.css-discuss.org/mailman/listinfo/css-d
List wiki/FAQ -- http://css-discuss.incutio.com/
List policies -- http://css-discuss.org/policies.html
Supported by evolt.org -- http://www.evolt.org/help_support_evolt/