Markus Kuhn writes:
> I'm afraid that cat will for the foreseeable future not be one of the
> tools suitable for reading Hebrew text unless it has been stored
> visually or we implement the ECMA/ISO implicit mode in xterm.
>
> Bidicat essentially exists already in some forms:
>
> http://czyborra.com/arabjoin/
>
> Arabjoin is Roman Czyborra's little Perl tool that takes Arabic UTF-8
> text (encoded in the U+06xx Arabic block in logical order) as input,
> performs Arabic glyph joining, and outputs a UTF-8 octet stream that
> is arranged in visual order. This gives readable results when formatted
> with a simple Unicode renderer like xterm or yudit that does not
> handle Arabic differently but simply outputs all glyphs in
> left-to-right order.
Don't go that way; you would be reinventing the entire mess with the
three ISO-8859-8 variants (implicit, explicit, visual encoding).
In Unicode and UTF-8, unlike ISO-8859-8, the ordering is always
logical ("implicit" is the old term), not visual. Tools like arabjoin
are hacks outside of the standards. Their right place is inside the
display engine (here: xterm), otherwise applications and xterm must
communicate using malformed anti-Unicode.
Bruno
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/