Re: Is the binaryness/textness of a data format a property?

2020-03-21 Thread Eli Zaretskii via Unicode
> From: "Doug Ewell" > Cc: > Date: Sat, 21 Mar 2020 13:33:18 -0600 > > > Emacs uses some of that for supporting charsets that cannot be mapped > > into Unicode. GB18030 is one example of such charsets. The internal > > representation of characters in Emacs is UTF-8, so it uses 5-byte > >

Re: Is the binaryness/textness of a data format a property?

2020-03-21 Thread Eli Zaretskii via Unicode
> Date: Sat, 21 Mar 2020 11:13:40 -0600 > From: Doug Ewell via Unicode > > Adam Borowski wrote: > > > Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF > > or U+11000..U+7FFF (or possibly even up to 2³⁶ or 2⁴²), which has > > its uses but is not well-formed Unicode. >

Re: Annoyances from Implementation of Canonical Equivalence

2019-10-18 Thread Eli Zaretskii via Unicode
> Date: Thu, 17 Oct 2019 21:58:50 +0100 > From: Richard Wordingham via Unicode > > > Sounds arbitrary to me. How do we know that all the users will want > > that? > > If the change from codepoint by codepoint matching is just canonical > equivalence, then there is no way that the ‘n’ of ‘na’

Re: Annoyances from Implementation of Canonical Equivalence

2019-10-17 Thread Eli Zaretskii via Unicode
> Date: Thu, 17 Oct 2019 02:26:35 +0100 > From: Richard Wordingham > Cc: Eli Zaretskii > > (c) A search for 'n' finding 'ñ'. > > When it comes to canonical equivalence, one answer to (c) is that as > soon as one adds the next letter letter, e.g. 'na', the search will no > longer match 'ñ'.

Re: Annoyances from Implementation of Canonical Equivalence (was: Pure Regular Expression Engines and Literal Clusters)

2019-10-16 Thread Eli Zaretskii via Unicode
> Date: Tue, 15 Oct 2019 20:52:15 +0100 > From: Richard Wordingham via Unicode > > > > > I'm well aware of the official position. However, when we > > > > attempted to implement it unconditionally in Emacs, some people > > > > objected, and brought up good reasons. You can, of course, elect >

Re: Annoyances from Implementation of Canonical Equivalence (was: Pure Regular Expression Engines and Literal Clusters)

2019-10-15 Thread Eli Zaretskii via Unicode
> Date: Tue, 15 Oct 2019 00:23:59 +0100 > From: Richard Wordingham via Unicode > > > I'm well aware of the official position. However, when we attempted > > to implement it unconditionally in Emacs, some people objected, and > > brought up good reasons. You can, of course, elect to disregard

Re: Pure Regular Expression Engines and Literal Clusters

2019-10-14 Thread Eli Zaretskii via Unicode
> Date: Mon, 14 Oct 2019 19:29:39 +0100 > From: Richard Wordingham via Unicode > > On Mon, 14 Oct 2019 10:05:49 +0300 > Eli Zaretskii via Unicode wrote: > > > I think these are two separate issues: whether search should normalize > > (a.k.a. performs character fo

Re: Pure Regular Expression Engines and Literal Clusters

2019-10-14 Thread Eli Zaretskii via Unicode
> Date: Mon, 14 Oct 2019 01:10:45 +0100 > From: Richard Wordingham via Unicode > > >> Besides invalidating complexity metrics, the issue was what \p{Lu} > >> should match. For example, with PCRE syntax, GNU grep Version 2.25 > >> \p{Lu} matches U+0100 but not . When I'm respecting > >>

Re: The native name of Tai Viet script and language(s)

2019-09-13 Thread Eli Zaretskii via Unicode
> Cc: unicode@unicode.org > From: r12a > Date: Thu, 12 Sep 2019 14:34:11 +0100 > > On 27/08/2019 07:33, Eli Zaretskii via Unicode wrote: > > Yes, it's an old and outdated text (Emacs is around since 1985, and > > supports multilingual text editing since 1997). Easy

Re: The native name of Tai Viet script and language(s)

2019-08-27 Thread Eli Zaretskii via Unicode
> Date: Tue, 27 Aug 2019 08:33:21 +0100 > From: Richard Wordingham via Unicode > > @Eli: Ideally, you need to check that default font and language are > consistent. There are some regional differences which make it > necessary to calibrate the writing system, and one word may not suffice.

Re: The native name of Tai Viet script and language(s)

2019-08-27 Thread Eli Zaretskii via Unicode
> From: Peter Constable > Date: Tue, 27 Aug 2019 04:56:35 + > > " As the proposal for TaiViet script to the Unicode is still on > the progress, we use the Private Use Area for TaiViet > characters (U+F000..U+F07E). " > > Er... The script has been in Unicode for about 10 years, since Unicode

The native name of Tai Viet script and language(s)

2019-08-22 Thread Eli Zaretskii via Unicode
Could someone "in the know" please help me make the Tai Viet script documentation in Emacs accurate? The current short description we have is in the file lisp/language/tai-viet.el in the Emacs source tree. You can see it here:

Re: Numeric group separators and Bidi

2019-07-09 Thread Eli Zaretskii via Unicode
> Date: Tue, 9 Jul 2019 20:59:15 +0200 > From: Philippe Verdy via Unicode > > I can't find a way to use narrow spaces instead of punctuation signs (dot or > comma) for example in > Arabic/Hebrew, for example to present tabular numeric data in a really > language-neutral way. In Arabic/Hebrew >

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Sat, 9 Feb 2019 20:36:50 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > On Sat, Feb 9, 2019 at 8:13 PM Eli Zaretskii wrote: > > > That's the application's problem, not the terminal's. An application > > that wants its column to line

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Sat, 9 Feb 2019 20:03:21 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > Let's suppose a utility outputs these two lines of text: > abcdefg| > complex| > > whereas "abcdefg" are these English letters themselves, but "complex" > is a

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Sat, 9 Feb 2019 19:25:08 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > > You need to use what HarfBuzz tells you _instead_ of wcswidth. It is > > in general wrong to use wcswidth or anything similar when you use a > > shaping engine

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Eli Zaretskii via Unicode
> Date: Sat, 9 Feb 2019 18:42:52 +0100 > Cc: unicode Unicode Discussion > From: Egmont Koblinger via Unicode > > What if harfbuzz tells us that the overall width for rendering a > particular grapheme cluster is significantly different from its > designated area (the number of character cells

Re: Bidi paragraph direction in terminal emulators

2019-02-09 Thread Eli Zaretskii via Unicode
> From: Elias Mårtenson > Date: Sat, 9 Feb 2019 13:33:49 +0800 > Cc: Egmont Koblinger , unicode > > Moreover, emitting the control sequences that set the mode is in > itself a complication, because if the terminal doesn't support them, > the result could be corrupted display. You will need

Re: Bidi paragraph direction in terminal emulators

2019-02-08 Thread Eli Zaretskii via Unicode
> Date: Sat, 9 Feb 2019 00:18:14 + > From: Richard Wordingham via Unicode > > > For character composition, you must have a shaping engine to talk to, > > and the shaper should tell you the width of each grapheme cluster it > > returns. > > (a) What defines the grapheme clusters? The

Re: Bidi paragraph direction in terminal emulators

2019-02-08 Thread Eli Zaretskii via Unicode
> Date: Fri, 8 Feb 2019 21:55:58 + > From: Richard Wordingham via Unicode > > > > What's the sledgehammer for Windows? > > > Not sure what you meant. "M-x term" doesn't work on Windows. > > So my question is, 'What do I use on Windows?' The application may be > disproportionate to the

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-08 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Fri, 8 Feb 2019 17:44:53 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > For certain apps, one of the modes is required (e.g. for cat it's the > implicit mode). For other tasks it's the other mode (e.g. for emacs > the explicit mode).

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-08 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Fri, 8 Feb 2019 15:42:51 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > On Fri, Feb 8, 2019 at 3:28 PM Eli Zaretskii wrote: > > > You can have what you call the "explicit mode" if you set the variable > > bidi-display-reordering to

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-08 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Fri, 8 Feb 2019 14:57:56 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > According to the description you give, Emacs's terminal always applies > the BiDi algorithm, therefore by its design only implements what I > call "implicit mode",

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-08 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Fri, 8 Feb 2019 13:30:42 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > Hi Eli, > > > Not sure why. There are terminal emulators out there which support > > proportional fonts. > > Well, of course, a terminal emulator can load any

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-08 Thread Eli Zaretskii via Unicode
> Date: Fri, 8 Feb 2019 06:40:44 + > From: Richard Wordingham via Unicode > > > I, for one, am not to the slightest bit interested in abandoning the > > character grid and allowing for proportional fonts. This would just > > break a gazillion of things. > > The message I take from that and

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> Date: Thu, 7 Feb 2019 22:35:23 + > From: Richard Wordingham via Unicode > > > > Do you mean you aim to maintain a regex that matches everyone's > > > prompt in the world, without a significant amount of false positive > > > matches on non-prompt lines? > > > Yes. > > Wow! You'll do

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Thu, 7 Feb 2019 19:01:33 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > On Thu, Feb 7, 2019 at 6:53 PM Eli Zaretskii wrote: > > > No, it needs no interaction. Unless the regexp doesn't work for you, > > which you should then report

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Thu, 7 Feb 2019 18:20:02 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > > It uses a regular expression, see term-prompt-regexp. > > So, it's not automatic, needs user interaction No, it needs no interaction. Unless the regexp doesn't

Re: Bidi paragraph direction in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Thu, 7 Feb 2019 18:12:37 +0100 > Cc: Richard Wordingham , > unicode Unicode Discussion > > I believe it's not my mental model that's weird, but your use of > terminology that doesn't match UBA's that confused me. Well, let's just say that Emacs uses the

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> Date: Thu, 7 Feb 2019 00:45:55 +0100 > Cc: unicode Unicode Discussion > From: Egmont Koblinger via Unicode > > > Not necessarily. One could allow the first strong character in the > > prompt to determine the paragraph directions > > How does Emacs know what's a prompt? How can it tell it

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> Date: Wed, 6 Feb 2019 23:32:43 + > From: Richard Wordingham via Unicode > > > You define paragraphs as emptyline-separated blocks on which you > > perform autodetection of the paragraph direction. This is great! As > > I've mentioned, I'd love to have such a mode in terminals, but it's > >

Re: Bidi paragraph direction in terminal emulators

2019-02-07 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Wed, 6 Feb 2019 22:01:59 +0100 > Cc: Richard Wordingham , unicode@unicode.org > > - Emacs running in a terminal shows an underscore wherever there's a > BiDi control in the source file – while the graphical one doesn't. > This looks like a simple bug to me,

Re: Bidi paragraph direction in terminal emulators

2019-02-05 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Tue, 5 Feb 2019 02:28:50 +0100 > Cc: unicode@unicode.org > > I have to admit, I'm not an Emacs user, I only have some vague ideas > how powerful a tool it is. But in its very core I still believe it's a > text editor – is it fair to say this? It could be used for

Re: Bidi paragraph direction in terminal emulators

2019-02-05 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Tue, 5 Feb 2019 01:32:34 +0100 > Cc: unicode@unicode.org > > On the other hand, it's not unreasonable for higher level stuff (e.g. > shell scripts, or tools like "zip") to use such control characters. Yes, but most of them won't ever do that. > > No, this

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators

2019-02-05 Thread Eli Zaretskii via Unicode
> Date: Tue, 5 Feb 2019 00:05:47 + > From: Richard Wordingham via Unicode > > > > Actually, UAX#9 defines "paragraph" as the chunk of text delimited > > > by paragraph separator characters. This means characters whose bidi > > > category is B, which includes Newline, the CR-LF pair on

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators)

2019-02-05 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Tue, 5 Feb 2019 00:08:10 +0100 > Cc: unicode@unicode.org > > every single newline character starts a new paragraph. The result of > printf "Hello\nWorld\n" > world.txt > is a text file consisting of two paragraphs, with 5 characters in each. > Correct? Yes. >

Re: Proposal for BiDi in terminal emulators

2019-02-04 Thread Eli Zaretskii via Unicode
> Date: Mon, 4 Feb 2019 21:00:55 + > From: Richard Wordingham via Unicode > > > The definition is trivial: the order of characters on > > display, from left to right. The only possible reason to split hairs > > here could be when some characters don't appear on display, like > > control

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-04 Thread Eli Zaretskii via Unicode
> Date: Mon, 4 Feb 2019 19:45:13 + > From: Richard Wordingham via Unicode > > Yes. If one has a text composed of LTR and RTL paragraphs, one has to > choose how far apart their starting margins are. I think that could > get complicated for plain text if the terminal has unbounded width.

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-04 Thread Eli Zaretskii via Unicode
> Date: Mon, 4 Feb 2019 01:19:21 + > From: Richard Wordingham via Unicode > > On Sun, 03 Feb 2019 19:50:50 +0200 > Eli Zaretskii via Unicode wrote: > > > Do you see how this is carefully formatted to avoid overflowing an > > 80-column line of a

Re: Bidi paragraph direction in terminal emulators BiDi in terminal emulators)

2019-02-04 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Mon, 4 Feb 2019 00:36:23 +0100 > Cc: unicode@unicode.org > > The Unicode BiDi algorithm states that it operates on paragraphs of > text, and leaves it up to a higher protocol to define what a paragraph > exactly is. > > What's the definition of "paragraph" in

Re: Proposal for BiDi in terminal emulators

2019-02-04 Thread Eli Zaretskii via Unicode
> Date: Mon, 04 Feb 2019 05:25:43 +0200 > Cc: unicode@unicode.org > From: Eli Zaretskii via Unicode > > Try customizing scroll-conservatively, it sounds like you want that. Ignore me: I misunderstood what you were looking for. You are right: Emacs doesn't support such scrolling method.

Re: Proposal for BiDi in terminal emulators

2019-02-03 Thread Eli Zaretskii via Unicode
> Date: Sun, 3 Feb 2019 20:35:18 + > From: Richard Wordingham via Unicode > > > What is "screen overwriting" in this context? > > When instead of adding lines to the bottom, new lines are added on top > of and replace existing lines. I prefer the scrollable terminal > behaviour to the

Re: Proposal for BiDi in terminal emulators

2019-02-03 Thread Eli Zaretskii via Unicode
> Date: Sun, 3 Feb 2019 17:45:06 + > From: Richard Wordingham via Unicode > > > > So, what do you recommend I run grep from for Hebrew or Tai Lue? > > > > Inside Emacs, of course: "M-x grep RET" etc. > > That assumes you like using bindings for all the commands; I don't. What bindings?

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-03 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Sun, 3 Feb 2019 17:54:25 +0100 > Cc: unicode@unicode.org > > I'm arguing, although my reasons are not rock solid, that IMHO the > default should be the strict direction as set by SCP, without > autodetection. I think it's unreasonable and impractical to expect

Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-03 Thread Eli Zaretskii via Unicode
> Date: Sun, 03 Feb 2019 18:10:15 +0200 > Cc: richard.wording...@ntlworld.com, unicode@unicode.org > From: Eli Zaretskii via Unicode > > I think there are hard problems even for such "simple" utilities, and > I will start a separate thread about this. I think we

Re: Proposal for BiDi in terminal emulators

2019-02-03 Thread Eli Zaretskii via Unicode
> Date: Sun, 3 Feb 2019 02:43:06 + > Cc: Kent Karlsson > From: Richard Wordingham via Unicode > > So, what do you recommend I run grep from for Hebrew or Tai Lue? Inside Emacs, of course: "M-x grep RET" etc.

Re: Proposal for BiDi in terminal emulators

2019-02-03 Thread Eli Zaretskii via Unicode
> Date: Sun, 3 Feb 2019 01:30:26 + > From: Richard Wordingham via Unicode > > Shaping for RTL scripts happens on strings stored in logical order. > These are then laid out right to left, though the dominant usage of > the term 'advance width' for right-to-left glyph sequences feels >

Re: Proposal for BiDi in terminal emulators

2019-02-03 Thread Eli Zaretskii via Unicode
> Date: Sat, 2 Feb 2019 23:02:10 +0100 > Cc: unicode@unicode.org > From: Egmont Koblinger via Unicode > > On top of this, I make the clarification that combining marks need to > be reordered to be sent out to the terminal emulator _after_ their > base letter That is true in general regarding

Re: Proposal for BiDi in terminal emulators

2019-02-03 Thread Eli Zaretskii via Unicode
> Date: Sat, 2 Feb 2019 21:49:40 + > From: Richard Wordingham via Unicode > > Eli will probably tell me I'm behind the times, but there are a few > places where a Gnome-terminal is better than an Emacs GUI window. One > is colour highlighting of text found by grep. ??? The Emacs 'grep'

Re: Proposal for BiDi in terminal emulators

2019-02-03 Thread Eli Zaretskii via Unicode
> Date: Sun, 3 Feb 2019 03:02:13 +0100 > Cc: unicode@unicode.org > From: Egmont Koblinger via Unicode > > > All I am saying is that your proposal should define what it means by > > visual order. > > Are you nitpicking on me not giving a precise definition on the > otherwise IMO freaking obvious

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Fri, 1 Feb 2019 14:35:35 +0100 > Cc: Frédéric Grosshans , > unicode@unicode.org > > > You could do that, but it will require a lot of non-trivial processing > > from the applications. Text-mode applications don't want any complex > > tinkering, they want

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Fri, 1 Feb 2019 14:16:03 +0100 > Cc: Adam Borowski , unicode@unicode.org > > There's absolutely no way we could reorder first, and then handle > TAB's cursor movement. TAB's cursor movement happens in the lower > layer, reordering happens in the upper one. But

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Fri, 1 Feb 2019 13:54:02 +0100 > Cc: Adam Borowski , unicode@unicode.org > > For this behavior, the only feature you need from a terminal emulator > is to have a mode where it doesn't shuffle the characters. Currently > every emulator I'm aware of has such a

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Fri, 1 Feb 2019 13:40:48 +0100 > Cc: unicode@unicode.org > > I now understand that presentation forms isn't an ideal possible > approach, and the recommendation should be improved here. > > Until it happens, I'm uncertain whether using presentation form >

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Eli Zaretskii via Unicode
> Date: Thu, 31 Jan 2019 23:17:19 + > From: Richard Wordingham via Unicode > > Emacs needs a lot of help - I can't write a generic Tai Tham > OpenType .flt file :-( Which is why Emacs is migrating towards HarfBuzz.

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Eli Zaretskii via Unicode
> Date: Thu, 31 Jan 2019 10:58:54 +0100 > Cc: unicode@unicode.org > From: Egmont Koblinger via Unicode > > Yes, I do argue that emacs will need to print a new escape sequence. > Which is much-much-much-much-much better than having to tell users to > go into the settings of their macOS Terminal /

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Thu, 31 Jan 2019 10:41:02 +0100 > Cc: Frédéric Grosshans , > unicode@unicode.org > > > Personally, I think we should simply assume that complex script > > shaping is left to the terminal, and if the terminal cannot do that, > > then that's a restriction of

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Thu, 31 Jan 2019 10:28:27 +0100 > Cc: Adam Borowski , unicode@unicode.org > > On Wed, Jan 30, 2019 at 5:10 PM Eli Zaretskii wrote: > > > I think the application could use TAB characters to get to the next > > cell, then simplistic reordering would also work. >

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Thu, 31 Jan 2019 10:21:52 +0100 > Cc: Adam Borowski , unicode@unicode.org > > > Does anyone know of a terminal emulator which supports isolates? > > GNOME Terminal's (VTE's) current work-in-progress implementation does > remember BiDi control characters just

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Thu, 31 Jan 2019 10:11:22 +0100 > Cc: unicode@unicode.org > > > It doesn't do _any_ shaping. Complex script shaping is left to the > > terminal, because it's impossible to do shaping in any reasonable way > > [...] > > Partially, you are right. On the other

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Eli Zaretskii via Unicode
> Date: Wed, 30 Jan 2019 15:49:34 +0100 > Cc: unicode@unicode.org > From: Egmont Koblinger via Unicode > > I outline in the document problems that arise from the terminal > emulator performing shaping on its contents in "explicit" mode, which > is to be used by Emacs and others. The terminal

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Eli Zaretskii via Unicode
> Date: Wed, 30 Jan 2019 15:25:32 +0100 > Cc: unicode@unicode.org > From: Egmont Koblinger via Unicode > > > ╒═══╤══╕ > > │ filename1 │ 123 │ > > │ FILENAME2 │ 17 │ > > └───┴──┘ > > > > I'm afraid there's no good way to do BiDi without support from individual > >

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Eli Zaretskii via Unicode
> Date: Wed, 30 Jan 2019 15:07:22 +0100 > Cc: unicode@unicode.org > From: Egmont Koblinger via Unicode > > Another possible approach is to leave the terminal doing BiDi, but > embed all the text fragments in FSI...PDI blocks. Does anyone know of a terminal emulator which supports isolates?

Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Eli Zaretskii via Unicode
> From: Egmont Koblinger > Date: Wed, 30 Jan 2019 14:36:42 +0100 > Cc: unicode@unicode.org > > - GNU Emacs reshuffles the characters according to the BiDi algorithm, > expecting that the terminal emulator doesn't do any BiDi. Yes, users are told to disable bidi reordering of the terminal, if

Re: Proposal for BiDi in terminal emulators

2019-01-29 Thread Eli Zaretskii via Unicode
> Date: Tue, 29 Jan 2019 13:50:31 +0100 > From: Egmont Koblinger via Unicode > > [1] https://terminal-wg.pages.freedesktop.org/bidi/ Interesting document, thanks for writing it. My personal experience with bringing BiDi to Emacs led me to a firm conclusion that BiDi support by terminal

Re: Proposal for BiDi in terminal emulators

2019-01-29 Thread Eli Zaretskii via Unicode
> Date: Tue, 29 Jan 2019 13:50:31 +0100 > From: Egmont Koblinger via Unicode > > In turn, vim, emacs and friends stand there clueless, not knowing > how to do BiDi in terminals. This is inaccurate: Emacs (at least the brand known as "GNU Emacs") supports bidirectional editing in text terminals

Re: Unicode String Models

2018-09-11 Thread Eli Zaretskii via Unicode
> Date: Wed, 12 Sep 2018 00:13:52 +0200 > Cc: unicode@unicode.org > From: Hans Åberg via Unicode > > It might be useful to represent non-UTF-8 bytes as Unicode code points. One > way might be to use a codepoint to indicate high bit set followed by the byte > value with its high bit set to 0,

Re: Unicode String Models

2018-09-11 Thread Eli Zaretskii via Unicode
> From: Hans Åberg > Date: Tue, 11 Sep 2018 20:14:30 +0200 > Cc: hsivo...@hsivonen.fi, > unicode@unicode.org > > If one encounters a file with mixed encodings, it is good to be able to view > its contents and then convert it, as I see one can do in Emacs. Yes. And mixed encodings is not the

Re: Unicode String Models

2018-09-11 Thread Eli Zaretskii via Unicode
> From: Hans Åberg > Date: Tue, 11 Sep 2018 19:13:28 +0200 > Cc: Henri Sivonen , > unicode@unicode.org > > > In Emacs, each raw byte belonging > > to a byte sequence which is invalid under UTF-8 is represented as a > > special multibyte sequence. IOW, Emacs's internal representation > >

Re: Unicode String Models

2018-09-11 Thread Eli Zaretskii via Unicode
> Date: Tue, 11 Sep 2018 13:12:40 +0300 > From: Henri Sivonen via Unicode > > * I suggest splitting the "UTF-8 model" into three substantially > different models: > > 1) The UTF-8 Garbage In, Garbage Out model (the model of Go): No > UTF-8-related operations are performed when ingesting

Re: Unicode String Models

2018-09-09 Thread Eli Zaretskii via Unicode
> From: Philippe Verdy > Date: Sun, 9 Sep 2018 19:35:47 +0200 > Cc: Richard Wordingham , > unicode Unicode Discussion > > In Emacs, buffer text is a character string with a gap, actually. > > A text buffer with gaps is a complex structure, not just a plain string. The difference is

Re: Unicode String Models

2018-09-09 Thread Eli Zaretskii via Unicode
> Date: Sun, 9 Sep 2018 16:10:26 +0200 > Cc: unicode Unicode Discussion > From: Philippe Verdy via Unicode > > In practive, we use a memory by preparing the "small memory" while > instantiating a new iterator that will > process the whole string (which may not be fully loaded in memory, in

Re: EOL conventions (was: Re: UCD in XML or in CSV? (is: UCD in YAML))

2018-09-08 Thread Eli Zaretskii via Unicode
> Date: Sat, 8 Sep 2018 02:29:12 +0200 (CEST) > From: Marcel Schneider > Cc: RebeccaBettencourt , verd...@wanadoo.fr, > d3c...@gmail.com, d...@ewellic.org, unicode@unicode.org > > > > And it only took them 33 years. :) > > > > That's OK, because Unix tools cannot handle Windows

Re: UCD in XML or in CSV? (is: UCD in YAML)

2018-09-07 Thread Eli Zaretskii via Unicode
> Date: Fri, 7 Sep 2018 12:47:44 -0700 > Cc: d3c...@gmail.com, Doug Ewell , > unicode > From: Rebecca Bettencourt via Unicode > > On Fri, Sep 7, 2018 at 11:20 AM Philippe Verdy via Unicode > wrote: > > That version has been announced in the Windows 10 Hub several weeks ago. > > And

Re: Line wrapping of mixed LTR/RTL text

2018-08-28 Thread Eli Zaretskii via Unicode
> From: Cosmin Apreutesei > Date: Tue, 28 Aug 2018 21:28:58 +0300 > Cc: unicode@unicode.org > > > That is not so if the line ends after the whitespace: in that case the > > whitespace is trailing, and will appear at the visual end of the > > line. > > So only if it's a soft break I should

Re: Line wrapping of mixed LTR/RTL text

2018-08-28 Thread Eli Zaretskii via Unicode
> Date: Tue, 28 Aug 2018 13:44:58 +0300 > From: Cosmin Apreutesei via Unicode > > There is this sentence in UAX#9 which provides a clue: "[...] trailing > whitespace will appear at the visual end of the line (in the paragraph > direction).". I'm not sure what that means, but by doing some tests

Re: Emacs Verbose Character Entry (was Private Use Areas)

2018-08-24 Thread Eli Zaretskii via Unicode
> Date: Thu, 23 Aug 2018 22:15:10 +0100 > From: Richard Wordingham via Unicode > > On Thu, 23 Aug 2018 21:47:03 +0200 > "Janusz S. Bień via Unicode" wrote: > > > My needs are very simple, for example C-x 8 Return LATIN CAPITAL > > LETTER A WITH MACRON AND BREVE [MUFI] should yield the

Re: Private Use areas

2018-08-24 Thread Eli Zaretskii via Unicode
> From: jsb...@mimuw.edu.pl (Janusz S. Bień) > Cc: unicode@unicode.org, richard.wording...@ntlworld.com > Date: Thu, 23 Aug 2018 21:47:03 +0200 > > I'm very glad you join the discussion. I'm sorry for not joining sooner. In my defense, I missed the reference to Emacs, and the rest of the

Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-19 Thread Eli Zaretskii via Unicode
> Date: Thu, 19 Jul 2018 10:38:18 +0300 > Cc: Asmus Freytag > From: Shai Berger via Unicode > > And again -- the point is interoperability. If I cannot trust that > people I communicate with make the same choices I make, plain text > cannot be used. This conclusion is too extreme. In Real

Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-14 Thread Eli Zaretskii via Unicode
> Date: Sat, 14 Jul 2018 13:09:11 +0300 > From: Shai Berger > Cc: Eli Zaretskii > > I have no argument with this, but I do think that in such cases it is > wrong for the app to pretend that it is still treating the text as > plain. What is "plain text" in this context? Does, for example, text

Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-13 Thread Eli Zaretskii via Unicode
> Date: Fri, 13 Jul 2018 08:57:25 +0100 > From: Richard Wordingham via Unicode > > Even just for horizontal text, one problem is the shape of the canvas. > If it has a left and a right-hand margin, than having an undetermined > direction by default can work, given enough memory. The rendering >

Re: UAX #9: applicability of higher-level protocols to bidi plaintext

2018-07-10 Thread Eli Zaretskii via Unicode
> Date: Tue, 10 Jul 2018 13:37:56 +0200 > Cc: unicode Unicode Discussion > From: Philippe Verdy via Unicode > > Your "standard compliant" plain text editor just forces a LTR default for the > whole document, and does not > tolerate that individual paragraphs may start with an undetermined

Re: metric for block coverage

2018-02-18 Thread Eli Zaretskii via Unicode
> Cc: unicode-requ...@unicode.org > Date: Sun, 18 Feb 2018 14:35:00 +0100 > From: "Janusz S. Bień via Unicode" > > As a Debian user using some rare characters for old Polish > transliteration I would be happy with a tool which scans > available/installed fonts for a specific

Re: Invisible characters must be specified to be visible in security-sensitive situations

2018-02-15 Thread Eli Zaretskii via Unicode
> Date: Thu, 15 Feb 2018 17:33:12 -0500 > From: Oren Watson via Unicode > > https://securelist.com/zero-day-vulnerability-in-telegram/83800/ > > You could disallow these characters in filenames, but when filename handling > is charset-agnostic due to the > extended-ascii

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-22 Thread Eli Zaretskii via Unicode
> Date: Fri, 22 Dec 2017 15:36:35 + > From: Richard Wordingham via Unicode > > Emacs is civilised in that it allows one to delete character by > character from either end. That may, however, require some > intelligence on the part of the user so that they don't get

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-21 Thread Eli Zaretskii via Unicode
> Date: Thu, 21 Dec 2017 22:04:37 -0800 > Cc: Unicode Public > From: Manish Goregaokar via Unicode > > However, Firefox deletes by code point. As does Emacs, btw.

Re: Normalise Tai Tham or not?

2017-10-12 Thread Eli Zaretskii via Unicode
> Date: Wed, 11 Oct 2017 22:01:32 +0100 > From: Richard Wordingham via Unicode > > The description I had found undersold the noble intention. If you mean that the documentation doesn't describe the feature well enough, I'd welcome a documentation bug report. >

Re: Normalise Tai Tham or not?

2017-10-11 Thread Eli Zaretskii via Unicode
> Date: Tue, 10 Oct 2017 21:51:55 +0100 > From: Richard Wordingham via Unicode > > > Emacs lately introduced character-folding in searches, but it's turned > > off by default, as many users objected. > > I don't see how that helps with this problem. If I search for the >

Re: Unicode education in Schools

2017-08-26 Thread Eli Zaretskii via Unicode
> Date: Sat, 26 Aug 2017 22:07:57 +0100 > From: Richard Wordingham via Unicode > > > We are miscommunicating. My point was that programming for MS-Windows > > needs a good understanding of what the UTF-16 surrogates are, and in > > what MS-Windows APIs/library functions

Re: Unicode education in Schools

2017-08-26 Thread Eli Zaretskii via Unicode
> Date: Sat, 26 Aug 2017 18:52:03 +0100 > From: Richard Wordingham via Unicode > > > > It shouldn't. UTF-16 works just like UTF-8, except that the code > > > units are bigger. > > > Not really, since UTF-8 doesn't have surrogates. > > It has 115 surrogates, thoroughly

Re: Unicode education in Schools

2017-08-26 Thread Eli Zaretskii via Unicode
> Date: Sat, 26 Aug 2017 16:09:33 +0100 > From: Richard Wordingham via Unicode > > > > Just steer them away from UTF-16! > > > > Which will leave them entirely unprepared for the MS-Windows Unicode > > programming, something they of course will never need in their > >

Re: Unicode education in Schools

2017-08-25 Thread Eli Zaretskii via Unicode
> Date: Fri, 25 Aug 2017 00:23:40 +0100 > From: Richard Wordingham via Unicode > > On Thu, 24 Aug 2017 17:17:10 + > Andre Schappo via Unicode wrote: > > > So, I consider it important to familiarise students with SMP > > characters as well as BMP

Re: Problems with BidiCharTest.txt

2017-07-16 Thread Eli Zaretskii via Unicode
> Date: Sun, 16 Jul 2017 07:13:02 +0300 > From: Dov Grobgeld via Unicode > > While implementing UAX#9 for Unicode 6.3 (and beyond) in FriBidi, I'm trying > to pass all the tests of > BidiCharacterTest.txt , and I'm having problem understanding a few of the > tests that to

Re: Emacs' implementation of the bidirectional algorithm

2017-07-01 Thread Eli Zaretskii via Unicode
> Date: Sat, 1 Jul 2017 16:36:52 +0300 > From: Itai Berli via Unicode > > Emacs claims to fully conform to the Unicode Bidirectional Algorithm > 8.0.0 (see sections 22.19 'Bidirectional Editing' and 37.26 > 'Bidirectional Display' of the Emacs manual) This is somewhat

Re: Emacs' implementation of the bidirectional algorithm

2017-07-01 Thread Eli Zaretskii via Unicode
> Date: Sat, 1 Jul 2017 16:36:52 +0300 > From: Itai Berli via Unicode > > Emacs claims to fully conform to the Unicode Bidirectional Algorithm > 8.0.0 (see sections 22.19 'Bidirectional Editing' and 37.26 > 'Bidirectional Display' of the Emacs manual), yet I have noticed

Re: Counting Devanagari Aksharas

2017-04-26 Thread Eli Zaretskii via Unicode
> Date: Wed, 26 Apr 2017 07:45:07 +0100 > From: Richard Wordingham via Unicode <unicode@unicode.org> > > On Wed, 26 Apr 2017 08:48:13 +0300 > Eli Zaretskii via Unicode <unicode@unicode.org> wrote: > > > > Date: Sun, 23 Apr 2017 22:59:49 +0100 > >

Re: Counting Devanagari Aksharas

2017-04-25 Thread Eli Zaretskii via Unicode
> Date: Sun, 23 Apr 2017 22:59:49 +0100 > From: Richard Wordingham > Cc: Eli Zaretskii > > If I search for CGJ, highlighting it is frequently supremely useless. > I want to know where it is; highlighting is merely a tool to find it on > the screen.

Re: Counting Devanagari Aksharas

2017-04-22 Thread Eli Zaretskii via Unicode
> Date: Sun, 23 Apr 2017 00:51:59 +0100 > Cc: Julian Bradfield <jcb+unic...@inf.ed.ac.uk> > From: Richard Wordingham via Unicode <unicode@unicode.org> > > On Sat, 22 Apr 2017 21:39:42 +0100 (BST) > Julian Bradfield via Unicode <unicode@unicode.org> wrote: >

Re: Counting Devanagari Aksharas

2017-04-22 Thread Eli Zaretskii via Unicode
> Date: Sat, 22 Apr 2017 17:13:36 +0100 > From: Richard Wordingham via Unicode > > > Movement by grapheme > > cluster is AFAIK the most natural way of moving in complex scripts. > > Evidence? Personal experience? > It's easiest for displaying the cursor. It's the _only_

Re: Counting Devanagari Aksharas

2017-04-22 Thread Eli Zaretskii via Unicode
> Date: Sat, 22 Apr 2017 11:13:16 +0100 > From: Richard Wordingham via Unicode > > At present these are split into two and three grapheme clusters > respectively, and LibreOffice cursor movement responds accordingly. > (SIGN AA starts a grapheme cluster in several scripts of

  1   2   >