I have application which generates the following message when trying to read a line with a character with a value greater than 0x7f (in this particular case it is the (C) symbol in a text file):
Unhandled Exception: System.ArgumentException: Arg_InvalidUTF8
Parameter name: bytes
in <0x003a2> 00 System.Text.UTF8Encoding:InternalGetChars
(byte[],int,int,char[],int,uint&,uint&,bool,bool)
in <0x00039> 00 .UTF8Decoder:GetChars (byte[],int,int,char[],int)
in <0x00374> 00 System.IO.StreamReader:ReadBuffer ()
in <0x00088> 00 System.IO.StreamReader:Read ()
in <0x00049> 00 System.IO.StreamReader:ReadLine ()
in <0x002c3> 00 lc.Class1:ConvertFile (System.IO.FileInfo)
in <0x001cc> 00 lc.Class1:Main (string[])
I have tried this same code in MS .NET and they just discard the
character, where instead in mono it throws the above exception.
I followed the code down to UTF8Encoding.cs and it seems that when i hit
the InternalGetChars if leftsize == 0 and the character leftover isn't a
value under 0x80 or a UTF start value then an exception is thrown.
I replaced the following code
if (leftSize == 0) {
...
...
} else {
// Invalid UTF-8 start character.
if (throwOnInvalid) {
throw new ArgumentException (_("Arg_InvalidUTF8"), "bytes1");
}
with :
if (leftSize == 0) {
...
...
} else {
if (posn >= length) {
throw new ArgumentException (_("Arg_InsufficientSpace"), "chars");
}
chars[posn++] = (char)ch;
and my program is now happy.
What i would like to know is, wouldn't be better if the character were
just added to the buffer without any fuss (or least discarded without an
exception being thrown like what seems to be happening under .NET)?
--
btouchet <[EMAIL PROTECTED]>
signature.asc
Description: This is a digitally signed message part
