RE: About Kana folding

2001-05-18 Thread Yves Arrouye
Kenneth, Thanks for the explanations. So I'd suggest you be very careful when trying to do this kind of a folding. If it is just for surface text matching, the number of false positive matches would likely swamp the number of false negatives you'd be correcting. On the other hand, if you

Re: UTF-8 signature in web and email

2001-05-18 Thread Martin Duerst
At 22:58 01/05/17 -0400, [EMAIL PROTECTED] wrote: Martin D$B—S(Bst wrote: There is about 5% of a justification for having a 'signature' on a plain-text, standalone file (the reason being that it's somewhat easier to detect that the file is UTF-8 from the signature than to read through

Re: Ancient writing found in Turkmenistan

2001-05-18 Thread Peter Ilieve
Mike Ayers wrote: ... However, we don't want the article, we want the picture! After lurking on this list for years, finally I can do something vaguely useful. :-) A piece about this appeared in The Times on Tuesday 15 May. There was a picture of the seal spread over three columns but this

Re: [OT] bits and bytes

2001-05-18 Thread Otto Stolz
On Thu, 17 May 2001 15:39:02 -0500, Peter Constable wrote: Can anyone clarify for me how big a byte has ever been? (If you could identify the particular hardware, that would be helpful.) The TR440, a German brand of computer (designed and built here at Konstanz), in use circa 1975..1990 (I

Re: [OT] bits and bytes

2001-05-18 Thread Bob_Hallissy
I was hoping someone with more detailed memory would mention this, but since not, and since it is a contender for having one of the largest minimal addressable unit (other than microcode storage): I wrote a couple of programs for a Control Data Corporation (CDC) 6600 back in the early '70s. I

Re: [OT] bits and bytes

2001-05-18 Thread Peter_Constable
Thanks for all the interesting feedback. Now let me ask a slightly different question: Prior to Unicode and ISO 10646, what were the smallest and largest size code units ever used for representing character data? In the various responses, there was reference to 6- and 9-bit character

Re: [OT] bits and bytes

2001-05-18 Thread Peter_Constable
On 05/18/2001 09:39:18 AM Michael \(michka\) Kaplan wrote: Well, most of the various CJK encodings clearly would have a lot more than 9 bits to them. Kind of required for any system dealing with thousands of characters. But do any of them encode using code units larger than 8 bits? Certainly

Re: [OT] bits and bytes

2001-05-18 Thread Michael \(michka\) Kaplan
Well, most of the various CJK encodings clearly would have a lot more than 9 bits to them. Kind of required for any system dealing with thousands of characters. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/ - Original Message - From: [EMAIL PROTECTED] To:

Re: [OT] bits and bytes

2001-05-18 Thread Frank da Cruz
Now let me ask a slightly different question: Prior to Unicode and ISO 10646, what were the smallest and largest size code units ever used for representing character data? Any characters bigger than 9 bits smaller than 6? Of course, Baudot was 5-bit code used widely in Teletype networks,

Re: [OT] bits and bytes

2001-05-18 Thread Michael \(michka\) Kaplan
From: [EMAIL PROTECTED] But do any of them encode using code units larger than 8 bits? Certainly if something like GB2312 were encoded in a flat (linear?) encoding that never used code-unit sequences, the code units would have to be larger than 9 bits. But I've only ever heard of them being

Re: [OT] bits and bytes

2001-05-18 Thread Markus Scherer
[EMAIL PROTECTED] wrote: the smallest and largest size code units ever used for representing character data? Teletype machines commonly use a 5-bit code (Baudot, International Alphabet Nr. 2). It has Shift-In/Shift-Out codes to switch between an alphabetic default level and a level with

Re: UTF-8 signature in web and email

2001-05-18 Thread Edward Cherlin
At 10:58 PM -0400 5/17/01, [EMAIL PROTECTED] wrote: The UTF-8 signature discussion appears every few months on this list, usually as a religious debate between those who believe in it and those who do not. Be forewarned, my religion may not match yours. :-) My religion suggests that we find

Re: [OT] bits and bytes

2001-05-18 Thread Peter_Constable
Morse code uses a one-bit scheme, if you will, or a small number of codes (short/long sound and some 3 or 4 standard lengths of pauses) depending on how you look at it. Well, either you say that Morse code has a character set of three characters: SPACE, DOT, DASH, meaning a two-bit encoding is

Re: UTF-8 signature in web and email

2001-05-18 Thread Michael \(michka\) Kaplan
michka the only book on internationalization in VB at http://www.i18nWithVB.com/ - Original Message - From: Edward Cherlin [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, May 18, 2001 1:08 PM Subject: Re: UTF-8 signature in web and email At 10:58 PM -0400 5/17/01, [EMAIL

Re: UTF-8 signature in web and email

2001-05-18 Thread Michael \(michka\) Kaplan
From: Edward Cherlin [EMAIL PROTECTED] A text file with a BOM is, if not rich text, at least above the poverty line. (modified from Ed's prior msg -- this one is a keeper!) michka