On Thu, Apr 26, 2012 at 11:53 PM, David Korn <[email protected]> wrote:
> cc:  [email protected]
> Subject: Re: Re: [ast-users] Re: read -d command not supporting 
> non-ASCII/Unicode chars
> --------
>
>> What exactly is the difficult part? ksh93 already supports one byte
>> delimiters. Non-ASCII characters can both be represented by a wchar_t
>> or a multibyte sequence. The multibyte sequence could be used as C
>> string and this C string could be used as delimiter, i.e. you search
>> for a C string as delimiter instead of a single byte.
>
> The problem is that ksh93 uses the sfio library function sfgetr() to read
> a record and this has to be modified to handle multi-byte.
>
> Secondly, when reading from a terminal, you can't just set the VEOL character
> to be the delimiter since it is now multi-byte so it may required
> a raw mode read.

3rd issue: Not all multibyte encodings allow "recovering" (sometimes
called "self-synchronizing") from the issue when the file offset
points (in-)to the middle of a multibyte charatacter byte sequence
(e.g. in this case the remainder of the text will read as gibberish
(exception: All encodings I know about allow recovering at the '\n'
(=newline) character)).
UTF-8 was explicitly designed (credits go to Ken Thompson for that
idea (and implementation)) that consumers _can_ recover but IMO the
sfio code shouldn't be UTF-8 specific (e.g. there are other modern
encodings/character sets/standards like GB18030 which are not UTF-8
encoded but still are very important (in GB18030's case the Chinese
goverment makes GB18030 support mandatory, e.g. you can't see your
software there without GB18030 support))

> We plan to made this changes after the next update which we hope to release
> at the end of this week.

Uhm... how are you and kpv intending to fix it ? IMO it would be
sufficient to assume that the file position is always at the position
of the next valid multibyte character (and any caller who does a
|seek()| is responsible to gurantee this (or gets eaten by the
resulting mess)).

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [email protected]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)

_______________________________________________
ast-users mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-users

Reply via email to