Kenneth,
Thanks for the explanations.
So I'd suggest you be very careful when trying to do this kind of
a folding. If it is just for surface text matching, the number of
false positive matches would likely swamp the number of false
negatives you'd be correcting.
On the other hand, if you
At 22:58 01/05/17 -0400, [EMAIL PROTECTED] wrote:
Martin D$BS(Bst wrote:
There is about 5% of a justification
for having a 'signature' on a plain-text, standalone file (the reason
being that it's somewhat easier to detect that the file is UTF-8 from the
signature than to read through
Mike Ayers wrote:
... However, we don't want the article, we want the picture!
After lurking on this list for years, finally I can do something
vaguely useful. :-)
A piece about this appeared in The Times on Tuesday 15 May.
There was a picture of the seal spread over three columns but this
On Thu, 17 May 2001 15:39:02 -0500, Peter Constable wrote:
Can anyone clarify for me how big a byte has ever been? (If you could
identify the particular hardware, that would be helpful.)
The TR440, a German brand of computer (designed and built here
at Konstanz), in use circa 1975..1990 (I
I was hoping someone with more detailed memory would mention this, but
since not, and since it is a contender for having one of the largest
minimal addressable unit (other than microcode storage):
I wrote a couple of programs for a Control Data Corporation (CDC) 6600 back
in the early '70s. I
Thanks for all the interesting feedback.
Now let me ask a slightly different question: Prior to Unicode and ISO
10646, what were the smallest and largest size code units ever used for
representing character data? In the various responses, there was reference
to 6- and 9-bit character
On 05/18/2001 09:39:18 AM Michael \(michka\) Kaplan wrote:
Well, most of the various CJK encodings clearly would have a lot more than
9
bits to them. Kind of required for any system dealing with thousands of
characters.
But do any of them encode using code units larger than 8 bits? Certainly
Well, most of the various CJK encodings clearly would have a lot more than 9
bits to them. Kind of required for any system dealing with thousands of
characters.
MichKa
Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/
- Original Message -
From: [EMAIL PROTECTED]
To:
Now let me ask a slightly different question: Prior to Unicode and ISO
10646, what were the smallest and largest size code units ever used for
representing character data?
Any characters bigger than 9 bits smaller than 6?
Of course, Baudot was 5-bit code used widely in Teletype networks,
From: [EMAIL PROTECTED]
But do any of them encode using code units larger than 8 bits? Certainly
if
something like GB2312 were encoded in a flat (linear?) encoding that never
used code-unit sequences, the code units would have to be larger than 9
bits. But I've only ever heard of them being
[EMAIL PROTECTED] wrote:
the smallest and largest size code units ever used for representing character data?
Teletype machines commonly use a 5-bit code (Baudot, International Alphabet Nr. 2). It
has Shift-In/Shift-Out codes to switch between an alphabetic default level and a level
with
At 10:58 PM -0400 5/17/01, [EMAIL PROTECTED] wrote:
The UTF-8 signature discussion appears every few months on this list,
usually as a religious debate between those who believe in it and those who
do not. Be forewarned, my religion may not match yours. :-)
My religion suggests that we find
Morse code uses a one-bit scheme, if you will, or a small number of codes
(short/long sound and some 3 or 4 standard lengths of pauses) depending on
how
you look at it.
Well, either you say that Morse code has a character set of three
characters: SPACE, DOT, DASH, meaning a two-bit encoding is
michka
the only book on internationalization in VB at
http://www.i18nWithVB.com/
- Original Message -
From: Edward Cherlin [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, May 18, 2001 1:08 PM
Subject: Re: UTF-8 signature in web and email
At 10:58 PM -0400 5/17/01, [EMAIL
From: Edward Cherlin [EMAIL PROTECTED]
A text file with a BOM is, if not rich text, at least above the poverty
line.
(modified from Ed's prior msg -- this one is a keeper!)
michka
15 matches
Mail list logo