Quoth Tzafrir Cohen on Mon, Sep 01, 2003:
> A small test (I hope you won't mind the Hebrew):

[snip -- can't do Hebrew ATM]

> It should have given the same output. Indeed the range between the Yud
> and the Tav worked, so the regex worked on multibyte Hebrew chars.

No it didn't.  It replaced vav, which is not between yud and tav.
I tried replacing the range yud-to-lamed, and it happily gave me
the same output (i.e., it replaced shin as well).  Something is
wrong here; and if you think for a second how sed works and how
UTF-8 is encoded, you will immediately see what it is.

Try to do "| sed s/....../foo/" and see what happens -- you will
get "fooM", where M is mem sofit.

> But still one character was messed-up after the regex.

That too.

> And I had hell of a time editing this: I practically couldn't insert
> text, because bash calculated internally Hebrew chars as taking two
> places (assumed here char==byte).

I used mlterm to test it, and my zsh had problems as well.
(mlterm 2.7.0, zsh 4.0.6, FreeBSD 4.8-STABLE)

> But this is RedHat 7.3, and the version of bash doesn't support UTF-8
> well enough. In RH9 it seems much better. 

That's exactly what I'm talking about.  That thing supports this
encoding, this thing doesn't, and what you have *in the end* is a
system which, in some rare situations, can take Unicode text and
deal with it, but mostly it can't.  The assumption of single-byte
characters shines through, and if you're not careful it bites
you.

> > Good to know, thanks.  Will mutt re-code text from anything to
> > Unicode?
> 
> Yes. (Thus is generally more "sensetive" than most GUI clients to bad
> encoding, as overriding bad encoding tends to be a less than trivial
> operation)

You lost me here.  What do you mean by overriding bad encoding,
and what do other apps do?

Vadik.

-- 
Prof:    So the American government went to IBM to come up with a data
         encryption standard and they came up with ...
Student: EBCDIC!

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to