On Tue, 16 Jan 2001, Bruno Haible wrote:
> It makes a difference when a piece of text has right-to-left
> embeddings that span a newline. For example,
>
> english text bla bla ... RLE hebrew text first part <newline>
> hebrew text second part PDF back to english text <newline>
> mixed english and hebrew without explicit direction <newline>
>
> The Unicode Bidi algorithm will treat the [RLE ... PDF] part
> specially. If now, through "sed" or "grep", the second line is
> removed, and the remaining two lines are rendered, the last line will
> be rendered differently, because it is under the effect of the RLE
> marker.
I think that in every environment that does not recognize the idea of a
paragraph, a line is considered a paragraph. So xterm and any other
displaying tool will misrender even the original paragraph. CR and LF are
considered block separators in the Unicode bidirectional algorithm,
which also agrees with this.
You can override this behaviour with a higher protocol, like what is done
in HTML. But also W3C forbids using the explicit marks in a recent
recommendation, and recommends tags as replacements.
I really don't know what should we do regarding this, because explicit
bidi marks that create these problems are both bad and necessary. My
scenario is an unaware tool that uses the glibc locales to display a
localized date. That date needs to be enclosed in embedding marks to get
rendered correctly in both LTR and RTL enclosing directions. But if the
line breaks somewhere inside the date, the user will be very sad....
--roozbeh
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/