Tim said: > I think Eric's proposal may have legs. Or at least something along these > lines. I agree with John's point that we need to start conservative and > expand from that base-line. To be too inclusive at first means it would be > nearly impossible to go back. > > We must though be very careful not to inadvertently exclude > scripts/characters that are used by some languages even though we thought > they were merely symbols.
The list you are looking for is provided by the Unicode Consortium: http://www.unicode.org/Public/UNIDATA/Scripts.txt That gives script assignments for Unicode characters (Latin, Greek, Cyrillic, Devanagari, Bengali, Han, ...), and provides guidance for what not to leave out if you are simply trying to make a conservative decision without leaving some languages essentially unrepresentable. Note that many scripts inherently include combining characters. I absolutely agree with Kent that a blanket prohibition of combining characters is unacceptable. In a discussion dominated by English, Chinese, and Korean speaker/writers, it might seem o.k., but I assure you that if there were as many Arabic, Urdu, Hindi, and Bengali speaker/writers participating, it would *not* seem o.k. Otherwise, deciding to omit punctuation, space characters, format control characters, and symbols is fine as a conservative approach to the problem, however. --Ken
