On Friday, March 28, 2003, at 02:14 pm, Dan Kogai wrote:
On the other hand, counting can be tricky even for natives. The very name of numbers changes depending on what you count.
parallels for this in English can be seen in English group names - a gaggle of geese, a troop of monkeys, a knot of toads, a pack of dogs. Anyone care to suggest a good one for a group of perl programmers ? a larry, a wall, a camel ..... ;-)
In Japanese the very notion of "a word" is often moot.
Linguistically I tend to look at ASCII words as being like Kanji - a combination of symbols to stand for a concept -in English we use symbols which were intended to represent the phonetic sound (Great Vowel Shift anyone?), while Kanji are combinations of symbols representing concepts. So in a way the word 'mentality' is really a multibyte character and the kanji for 'kangaikata' stands for the same mental idea as the word 'mentality' - what you call the container for that packet of data is up to you :-)
Programmatically the encoding delineates how the data can be chunked up - ASCII uses whitespace to separate words and a 7 bit envelope per character, while EUC-JP uses an 8 bit evelope and Shift-JIS uses a 'stand on the suitcase while I lock it' system to pack the same 8 bit data into a 7 bit envelope (but it was developed my Microsoft). Perl was developed by mostly native English speakers, so in text processing it takes advantage of recurring patterns of 7 bit ASCII data to determine how the data is chunked. And chunked it must be or it is in coherent, yet this article:
http://www.perl.com/pub/a/2000/05/cobol.html
talks of a perler meeting a bizarre group of programmers to whom
<< the idea of variable-length, \n-terminated records was new and strange >>
implying that data is chunked into fixed length records which aren't separated by tokens. Japanese? no COBOL ;-)
Robin