> From: "Doug Ewell"
> Cc:
> Date: Sat, 21 Mar 2020 13:33:18 -0600
>
> > Emacs uses some of that for supporting charsets that cannot be mapped
> > into Unicode. GB18030 is one example of such charsets. The internal
> > representation of characters in Emacs is UTF-8, so it uses 5-byte
> >
> Date: Sat, 21 Mar 2020 11:13:40 -0600
> From: Doug Ewell via Unicode
>
> Adam Borowski wrote:
>
> > Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF
> > or U+11000..U+7FFF (or possibly even up to 2³⁶ or 2⁴²), which has
> > its uses but is not well-formed Unicode.
>
> Date: Thu, 17 Oct 2019 21:58:50 +0100
> From: Richard Wordingham via Unicode
>
> > Sounds arbitrary to me. How do we know that all the users will want
> > that?
>
> If the change from codepoint by codepoint matching is just canonical
> equivalence, then there is no way that the ‘n’ of ‘na’
> Date: Thu, 17 Oct 2019 02:26:35 +0100
> From: Richard Wordingham
> Cc: Eli Zaretskii
>
> (c) A search for 'n' finding 'ñ'.
>
> When it comes to canonical equivalence, one answer to (c) is that as
> soon as one adds the next letter letter, e.g. 'na', the search will no
> longer match 'ñ'.
> Date: Tue, 15 Oct 2019 20:52:15 +0100
> From: Richard Wordingham via Unicode
>
> > > > I'm well aware of the official position. However, when we
> > > > attempted to implement it unconditionally in Emacs, some people
> > > > objected, and brought up good reasons. You can, of course, elect
>
> Date: Tue, 15 Oct 2019 00:23:59 +0100
> From: Richard Wordingham via Unicode
>
> > I'm well aware of the official position. However, when we attempted
> > to implement it unconditionally in Emacs, some people objected, and
> > brought up good reasons. You can, of course, elect to disregard
> Date: Mon, 14 Oct 2019 19:29:39 +0100
> From: Richard Wordingham via Unicode
>
> On Mon, 14 Oct 2019 10:05:49 +0300
> Eli Zaretskii via Unicode wrote:
>
> > I think these are two separate issues: whether search should normalize
> > (a.k.a. performs character fo
> Date: Mon, 14 Oct 2019 01:10:45 +0100
> From: Richard Wordingham via Unicode
>
> >> Besides invalidating complexity metrics, the issue was what \p{Lu}
> >> should match. For example, with PCRE syntax, GNU grep Version 2.25
> >> \p{Lu} matches U+0100 but not . When I'm respecting
> >>
> Cc: unicode@unicode.org
> From: r12a
> Date: Thu, 12 Sep 2019 14:34:11 +0100
>
> On 27/08/2019 07:33, Eli Zaretskii via Unicode wrote:
> > Yes, it's an old and outdated text (Emacs is around since 1985, and
> > supports multilingual text editing since 1997). Easy
> Date: Tue, 27 Aug 2019 08:33:21 +0100
> From: Richard Wordingham via Unicode
>
> @Eli: Ideally, you need to check that default font and language are
> consistent. There are some regional differences which make it
> necessary to calibrate the writing system, and one word may not suffice.
> From: Peter Constable
> Date: Tue, 27 Aug 2019 04:56:35 +
>
> " As the proposal for TaiViet script to the Unicode is still on
> the progress, we use the Private Use Area for TaiViet
> characters (U+F000..U+F07E). "
>
> Er... The script has been in Unicode for about 10 years, since Unicode
Could someone "in the know" please help me make the Tai Viet script
documentation in Emacs accurate?
The current short description we have is in the file
lisp/language/tai-viet.el in the Emacs source tree. You can see it
here:
> Date: Tue, 9 Jul 2019 20:59:15 +0200
> From: Philippe Verdy via Unicode
>
> I can't find a way to use narrow spaces instead of punctuation signs (dot or
> comma) for example in
> Arabic/Hebrew, for example to present tabular numeric data in a really
> language-neutral way. In Arabic/Hebrew
>
> From: Egmont Koblinger
> Date: Sat, 9 Feb 2019 20:36:50 +0100
> Cc: Richard Wordingham ,
> unicode Unicode Discussion
>
> On Sat, Feb 9, 2019 at 8:13 PM Eli Zaretskii wrote:
>
> > That's the application's problem, not the terminal's. An application
> > that wants its column to line
> From: Egmont Koblinger
> Date: Sat, 9 Feb 2019 20:03:21 +0100
> Cc: Richard Wordingham ,
> unicode Unicode Discussion
>
> Let's suppose a utility outputs these two lines of text:
> abcdefg|
> complex|
>
> whereas "abcdefg" are these English letters themselves, but "complex"
> is a
> From: Egmont Koblinger
> Date: Sat, 9 Feb 2019 19:25:08 +0100
> Cc: Richard Wordingham ,
> unicode Unicode Discussion
>
> > You need to use what HarfBuzz tells you _instead_ of wcswidth. It is
> > in general wrong to use wcswidth or anything similar when you use a
> > shaping engine
> Date: Sat, 9 Feb 2019 18:42:52 +0100
> Cc: unicode Unicode Discussion
> From: Egmont Koblinger via Unicode
>
> What if harfbuzz tells us that the overall width for rendering a
> particular grapheme cluster is significantly different from its
> designated area (the number of character cells
> From: Elias Mårtenson
> Date: Sat, 9 Feb 2019 13:33:49 +0800
> Cc: Egmont Koblinger , unicode
>
> Moreover, emitting the control sequences that set the mode is in
> itself a complication, because if the terminal doesn't support them,
> the result could be corrupted display. You will need
> Date: Sat, 9 Feb 2019 00:18:14 +
> From: Richard Wordingham via Unicode
>
> > For character composition, you must have a shaping engine to talk to,
> > and the shaper should tell you the width of each grapheme cluster it
> > returns.
>
> (a) What defines the grapheme clusters? The
> Date: Fri, 8 Feb 2019 21:55:58 +
> From: Richard Wordingham via Unicode
>
> > > What's the sledgehammer for Windows?
>
> > Not sure what you meant. "M-x term" doesn't work on Windows.
>
> So my question is, 'What do I use on Windows?' The application may be
> disproportionate to the
> From: Egmont Koblinger
> Date: Fri, 8 Feb 2019 17:44:53 +0100
> Cc: Richard Wordingham ,
> unicode Unicode Discussion
>
> For certain apps, one of the modes is required (e.g. for cat it's the
> implicit mode). For other tasks it's the other mode (e.g. for emacs
> the explicit mode).
> From: Egmont Koblinger
> Date: Fri, 8 Feb 2019 15:42:51 +0100
> Cc: Richard Wordingham ,
> unicode Unicode Discussion
>
> On Fri, Feb 8, 2019 at 3:28 PM Eli Zaretskii wrote:
>
> > You can have what you call the "explicit mode" if you set the variable
> > bidi-display-reordering to
> From: Egmont Koblinger
> Date: Fri, 8 Feb 2019 14:57:56 +0100
> Cc: Richard Wordingham ,
> unicode Unicode Discussion
>
> According to the description you give, Emacs's terminal always applies
> the BiDi algorithm, therefore by its design only implements what I
> call "implicit mode",
> From: Egmont Koblinger
> Date: Fri, 8 Feb 2019 13:30:42 +0100
> Cc: Richard Wordingham ,
> unicode Unicode Discussion
>
> Hi Eli,
>
> > Not sure why. There are terminal emulators out there which support
> > proportional fonts.
>
> Well, of course, a terminal emulator can load any
> Date: Fri, 8 Feb 2019 06:40:44 +
> From: Richard Wordingham via Unicode
>
> > I, for one, am not to the slightest bit interested in abandoning the
> > character grid and allowing for proportional fonts. This would just
> > break a gazillion of things.
>
> The message I take from that and
> Date: Thu, 7 Feb 2019 22:35:23 +
> From: Richard Wordingham via Unicode
>
> > > Do you mean you aim to maintain a regex that matches everyone's
> > > prompt in the world, without a significant amount of false positive
> > > matches on non-prompt lines?
>
> > Yes.
>
> Wow! You'll do
> From: Egmont Koblinger
> Date: Thu, 7 Feb 2019 19:01:33 +0100
> Cc: Richard Wordingham ,
> unicode Unicode Discussion
>
> On Thu, Feb 7, 2019 at 6:53 PM Eli Zaretskii wrote:
>
> > No, it needs no interaction. Unless the regexp doesn't work for you,
> > which you should then report
> From: Egmont Koblinger
> Date: Thu, 7 Feb 2019 18:20:02 +0100
> Cc: Richard Wordingham ,
> unicode Unicode Discussion
>
> > It uses a regular expression, see term-prompt-regexp.
>
> So, it's not automatic, needs user interaction
No, it needs no interaction. Unless the regexp doesn't
> From: Egmont Koblinger
> Date: Thu, 7 Feb 2019 18:12:37 +0100
> Cc: Richard Wordingham ,
> unicode Unicode Discussion
>
> I believe it's not my mental model that's weird, but your use of
> terminology that doesn't match UBA's that confused me.
Well, let's just say that Emacs uses the
> Date: Thu, 7 Feb 2019 00:45:55 +0100
> Cc: unicode Unicode Discussion
> From: Egmont Koblinger via Unicode
>
> > Not necessarily. One could allow the first strong character in the
> > prompt to determine the paragraph directions
>
> How does Emacs know what's a prompt? How can it tell it
> Date: Wed, 6 Feb 2019 23:32:43 +
> From: Richard Wordingham via Unicode
>
> > You define paragraphs as emptyline-separated blocks on which you
> > perform autodetection of the paragraph direction. This is great! As
> > I've mentioned, I'd love to have such a mode in terminals, but it's
> >
> From: Egmont Koblinger
> Date: Wed, 6 Feb 2019 22:01:59 +0100
> Cc: Richard Wordingham , unicode@unicode.org
>
> - Emacs running in a terminal shows an underscore wherever there's a
> BiDi control in the source file – while the graphical one doesn't.
> This looks like a simple bug to me,
> From: Egmont Koblinger
> Date: Tue, 5 Feb 2019 02:28:50 +0100
> Cc: unicode@unicode.org
>
> I have to admit, I'm not an Emacs user, I only have some vague ideas
> how powerful a tool it is. But in its very core I still believe it's a
> text editor – is it fair to say this? It could be used for
> From: Egmont Koblinger
> Date: Tue, 5 Feb 2019 01:32:34 +0100
> Cc: unicode@unicode.org
>
> On the other hand, it's not unreasonable for higher level stuff (e.g.
> shell scripts, or tools like "zip") to use such control characters.
Yes, but most of them won't ever do that.
> > No, this
> Date: Tue, 5 Feb 2019 00:05:47 +
> From: Richard Wordingham via Unicode
>
> > > Actually, UAX#9 defines "paragraph" as the chunk of text delimited
> > > by paragraph separator characters. This means characters whose bidi
> > > category is B, which includes Newline, the CR-LF pair on
> From: Egmont Koblinger
> Date: Tue, 5 Feb 2019 00:08:10 +0100
> Cc: unicode@unicode.org
>
> every single newline character starts a new paragraph. The result of
> printf "Hello\nWorld\n" > world.txt
> is a text file consisting of two paragraphs, with 5 characters in each.
> Correct?
Yes.
>
> Date: Mon, 4 Feb 2019 21:00:55 +
> From: Richard Wordingham via Unicode
>
> > The definition is trivial: the order of characters on
> > display, from left to right. The only possible reason to split hairs
> > here could be when some characters don't appear on display, like
> > control
> Date: Mon, 4 Feb 2019 19:45:13 +
> From: Richard Wordingham via Unicode
>
> Yes. If one has a text composed of LTR and RTL paragraphs, one has to
> choose how far apart their starting margins are. I think that could
> get complicated for plain text if the terminal has unbounded width.
> Date: Mon, 4 Feb 2019 01:19:21 +
> From: Richard Wordingham via Unicode
>
> On Sun, 03 Feb 2019 19:50:50 +0200
> Eli Zaretskii via Unicode wrote:
>
> > Do you see how this is carefully formatted to avoid overflowing an
> > 80-column line of a
> From: Egmont Koblinger
> Date: Mon, 4 Feb 2019 00:36:23 +0100
> Cc: unicode@unicode.org
>
> The Unicode BiDi algorithm states that it operates on paragraphs of
> text, and leaves it up to a higher protocol to define what a paragraph
> exactly is.
>
> What's the definition of "paragraph" in
> Date: Mon, 04 Feb 2019 05:25:43 +0200
> Cc: unicode@unicode.org
> From: Eli Zaretskii via Unicode
>
> Try customizing scroll-conservatively, it sounds like you want that.
Ignore me: I misunderstood what you were looking for. You are right:
Emacs doesn't support such scrolling method.
> Date: Sun, 3 Feb 2019 20:35:18 +
> From: Richard Wordingham via Unicode
>
> > What is "screen overwriting" in this context?
>
> When instead of adding lines to the bottom, new lines are added on top
> of and replace existing lines. I prefer the scrollable terminal
> behaviour to the
> Date: Sun, 3 Feb 2019 17:45:06 +
> From: Richard Wordingham via Unicode
>
> > > So, what do you recommend I run grep from for Hebrew or Tai Lue?
> >
> > Inside Emacs, of course: "M-x grep RET" etc.
>
> That assumes you like using bindings for all the commands; I don't.
What bindings?
> From: Egmont Koblinger
> Date: Sun, 3 Feb 2019 17:54:25 +0100
> Cc: unicode@unicode.org
>
> I'm arguing, although my reasons are not rock solid, that IMHO the
> default should be the strict direction as set by SCP, without
> autodetection.
I think it's unreasonable and impractical to expect
> Date: Sun, 03 Feb 2019 18:10:15 +0200
> Cc: richard.wording...@ntlworld.com, unicode@unicode.org
> From: Eli Zaretskii via Unicode
>
> I think there are hard problems even for such "simple" utilities, and
> I will start a separate thread about this.
I think we
> Date: Sun, 3 Feb 2019 02:43:06 +
> Cc: Kent Karlsson
> From: Richard Wordingham via Unicode
>
> So, what do you recommend I run grep from for Hebrew or Tai Lue?
Inside Emacs, of course: "M-x grep RET" etc.
> Date: Sun, 3 Feb 2019 01:30:26 +
> From: Richard Wordingham via Unicode
>
> Shaping for RTL scripts happens on strings stored in logical order.
> These are then laid out right to left, though the dominant usage of
> the term 'advance width' for right-to-left glyph sequences feels
>
> Date: Sat, 2 Feb 2019 23:02:10 +0100
> Cc: unicode@unicode.org
> From: Egmont Koblinger via Unicode
>
> On top of this, I make the clarification that combining marks need to
> be reordered to be sent out to the terminal emulator _after_ their
> base letter
That is true in general regarding
> Date: Sat, 2 Feb 2019 21:49:40 +
> From: Richard Wordingham via Unicode
>
> Eli will probably tell me I'm behind the times, but there are a few
> places where a Gnome-terminal is better than an Emacs GUI window. One
> is colour highlighting of text found by grep.
??? The Emacs 'grep'
> Date: Sun, 3 Feb 2019 03:02:13 +0100
> Cc: unicode@unicode.org
> From: Egmont Koblinger via Unicode
>
> > All I am saying is that your proposal should define what it means by
> > visual order.
>
> Are you nitpicking on me not giving a precise definition on the
> otherwise IMO freaking obvious
> From: Egmont Koblinger
> Date: Fri, 1 Feb 2019 14:35:35 +0100
> Cc: Frédéric Grosshans ,
> unicode@unicode.org
>
> > You could do that, but it will require a lot of non-trivial processing
> > from the applications. Text-mode applications don't want any complex
> > tinkering, they want
> From: Egmont Koblinger
> Date: Fri, 1 Feb 2019 14:16:03 +0100
> Cc: Adam Borowski , unicode@unicode.org
>
> There's absolutely no way we could reorder first, and then handle
> TAB's cursor movement. TAB's cursor movement happens in the lower
> layer, reordering happens in the upper one.
But
> From: Egmont Koblinger
> Date: Fri, 1 Feb 2019 13:54:02 +0100
> Cc: Adam Borowski , unicode@unicode.org
>
> For this behavior, the only feature you need from a terminal emulator
> is to have a mode where it doesn't shuffle the characters. Currently
> every emulator I'm aware of has such a
> From: Egmont Koblinger
> Date: Fri, 1 Feb 2019 13:40:48 +0100
> Cc: unicode@unicode.org
>
> I now understand that presentation forms isn't an ideal possible
> approach, and the recommendation should be improved here.
>
> Until it happens, I'm uncertain whether using presentation form
>
> Date: Thu, 31 Jan 2019 23:17:19 +
> From: Richard Wordingham via Unicode
>
> Emacs needs a lot of help - I can't write a generic Tai Tham
> OpenType .flt file :-(
Which is why Emacs is migrating towards HarfBuzz.
> Date: Thu, 31 Jan 2019 10:58:54 +0100
> Cc: unicode@unicode.org
> From: Egmont Koblinger via Unicode
>
> Yes, I do argue that emacs will need to print a new escape sequence.
> Which is much-much-much-much-much better than having to tell users to
> go into the settings of their macOS Terminal /
> From: Egmont Koblinger
> Date: Thu, 31 Jan 2019 10:41:02 +0100
> Cc: Frédéric Grosshans ,
> unicode@unicode.org
>
> > Personally, I think we should simply assume that complex script
> > shaping is left to the terminal, and if the terminal cannot do that,
> > then that's a restriction of
> From: Egmont Koblinger
> Date: Thu, 31 Jan 2019 10:28:27 +0100
> Cc: Adam Borowski , unicode@unicode.org
>
> On Wed, Jan 30, 2019 at 5:10 PM Eli Zaretskii wrote:
>
> > I think the application could use TAB characters to get to the next
> > cell, then simplistic reordering would also work.
>
> From: Egmont Koblinger
> Date: Thu, 31 Jan 2019 10:21:52 +0100
> Cc: Adam Borowski , unicode@unicode.org
>
> > Does anyone know of a terminal emulator which supports isolates?
>
> GNOME Terminal's (VTE's) current work-in-progress implementation does
> remember BiDi control characters just
> From: Egmont Koblinger
> Date: Thu, 31 Jan 2019 10:11:22 +0100
> Cc: unicode@unicode.org
>
> > It doesn't do _any_ shaping. Complex script shaping is left to the
> > terminal, because it's impossible to do shaping in any reasonable way
> > [...]
>
> Partially, you are right. On the other
> Date: Wed, 30 Jan 2019 15:49:34 +0100
> Cc: unicode@unicode.org
> From: Egmont Koblinger via Unicode
>
> I outline in the document problems that arise from the terminal
> emulator performing shaping on its contents in "explicit" mode, which
> is to be used by Emacs and others. The terminal
> Date: Wed, 30 Jan 2019 15:25:32 +0100
> Cc: unicode@unicode.org
> From: Egmont Koblinger via Unicode
>
> > ╒═══╤══╕
> > │ filename1 │ 123 │
> > │ FILENAME2 │ 17 │
> > └───┴──┘
> >
> > I'm afraid there's no good way to do BiDi without support from individual
> >
> Date: Wed, 30 Jan 2019 15:07:22 +0100
> Cc: unicode@unicode.org
> From: Egmont Koblinger via Unicode
>
> Another possible approach is to leave the terminal doing BiDi, but
> embed all the text fragments in FSI...PDI blocks.
Does anyone know of a terminal emulator which supports isolates?
> From: Egmont Koblinger
> Date: Wed, 30 Jan 2019 14:36:42 +0100
> Cc: unicode@unicode.org
>
> - GNU Emacs reshuffles the characters according to the BiDi algorithm,
> expecting that the terminal emulator doesn't do any BiDi.
Yes, users are told to disable bidi reordering of the terminal, if
> Date: Tue, 29 Jan 2019 13:50:31 +0100
> From: Egmont Koblinger via Unicode
>
> [1] https://terminal-wg.pages.freedesktop.org/bidi/
Interesting document, thanks for writing it.
My personal experience with bringing BiDi to Emacs led me to a firm
conclusion that BiDi support by terminal
> Date: Tue, 29 Jan 2019 13:50:31 +0100
> From: Egmont Koblinger via Unicode
>
> In turn, vim, emacs and friends stand there clueless, not knowing
> how to do BiDi in terminals.
This is inaccurate: Emacs (at least the brand known as "GNU Emacs")
supports bidirectional editing in text terminals
> Date: Wed, 12 Sep 2018 00:13:52 +0200
> Cc: unicode@unicode.org
> From: Hans Åberg via Unicode
>
> It might be useful to represent non-UTF-8 bytes as Unicode code points. One
> way might be to use a codepoint to indicate high bit set followed by the byte
> value with its high bit set to 0,
> From: Hans Åberg
> Date: Tue, 11 Sep 2018 20:14:30 +0200
> Cc: hsivo...@hsivonen.fi,
> unicode@unicode.org
>
> If one encounters a file with mixed encodings, it is good to be able to view
> its contents and then convert it, as I see one can do in Emacs.
Yes. And mixed encodings is not the
> From: Hans Åberg
> Date: Tue, 11 Sep 2018 19:13:28 +0200
> Cc: Henri Sivonen ,
> unicode@unicode.org
>
> > In Emacs, each raw byte belonging
> > to a byte sequence which is invalid under UTF-8 is represented as a
> > special multibyte sequence. IOW, Emacs's internal representation
> >
> Date: Tue, 11 Sep 2018 13:12:40 +0300
> From: Henri Sivonen via Unicode
>
> * I suggest splitting the "UTF-8 model" into three substantially
> different models:
>
> 1) The UTF-8 Garbage In, Garbage Out model (the model of Go): No
> UTF-8-related operations are performed when ingesting
> From: Philippe Verdy
> Date: Sun, 9 Sep 2018 19:35:47 +0200
> Cc: Richard Wordingham ,
> unicode Unicode Discussion
>
> In Emacs, buffer text is a character string with a gap, actually.
>
> A text buffer with gaps is a complex structure, not just a plain string.
The difference is
> Date: Sun, 9 Sep 2018 16:10:26 +0200
> Cc: unicode Unicode Discussion
> From: Philippe Verdy via Unicode
>
> In practive, we use a memory by preparing the "small memory" while
> instantiating a new iterator that will
> process the whole string (which may not be fully loaded in memory, in
> Date: Sat, 8 Sep 2018 02:29:12 +0200 (CEST)
> From: Marcel Schneider
> Cc: RebeccaBettencourt , verd...@wanadoo.fr,
> d3c...@gmail.com, d...@ewellic.org, unicode@unicode.org
>
> > > And it only took them 33 years. :)
> >
> > That's OK, because Unix tools cannot handle Windows
> Date: Fri, 7 Sep 2018 12:47:44 -0700
> Cc: d3c...@gmail.com, Doug Ewell ,
> unicode
> From: Rebecca Bettencourt via Unicode
>
> On Fri, Sep 7, 2018 at 11:20 AM Philippe Verdy via Unicode
> wrote:
>
> That version has been announced in the Windows 10 Hub several weeks ago.
>
> And
> From: Cosmin Apreutesei
> Date: Tue, 28 Aug 2018 21:28:58 +0300
> Cc: unicode@unicode.org
>
> > That is not so if the line ends after the whitespace: in that case the
> > whitespace is trailing, and will appear at the visual end of the
> > line.
>
> So only if it's a soft break I should
> Date: Tue, 28 Aug 2018 13:44:58 +0300
> From: Cosmin Apreutesei via Unicode
>
> There is this sentence in UAX#9 which provides a clue: "[...] trailing
> whitespace will appear at the visual end of the line (in the paragraph
> direction).". I'm not sure what that means, but by doing some tests
> Date: Thu, 23 Aug 2018 22:15:10 +0100
> From: Richard Wordingham via Unicode
>
> On Thu, 23 Aug 2018 21:47:03 +0200
> "Janusz S. Bień via Unicode" wrote:
>
> > My needs are very simple, for example C-x 8 Return LATIN CAPITAL
> > LETTER A WITH MACRON AND BREVE [MUFI] should yield the
> From: jsb...@mimuw.edu.pl (Janusz S. Bień)
> Cc: unicode@unicode.org, richard.wording...@ntlworld.com
> Date: Thu, 23 Aug 2018 21:47:03 +0200
>
> I'm very glad you join the discussion.
I'm sorry for not joining sooner. In my defense, I missed the
reference to Emacs, and the rest of the
> Date: Thu, 19 Jul 2018 10:38:18 +0300
> Cc: Asmus Freytag
> From: Shai Berger via Unicode
>
> And again -- the point is interoperability. If I cannot trust that
> people I communicate with make the same choices I make, plain text
> cannot be used.
This conclusion is too extreme. In Real
> Date: Sat, 14 Jul 2018 13:09:11 +0300
> From: Shai Berger
> Cc: Eli Zaretskii
>
> I have no argument with this, but I do think that in such cases it is
> wrong for the app to pretend that it is still treating the text as
> plain.
What is "plain text" in this context? Does, for example, text
> Date: Fri, 13 Jul 2018 08:57:25 +0100
> From: Richard Wordingham via Unicode
>
> Even just for horizontal text, one problem is the shape of the canvas.
> If it has a left and a right-hand margin, than having an undetermined
> direction by default can work, given enough memory. The rendering
>
> Date: Tue, 10 Jul 2018 13:37:56 +0200
> Cc: unicode Unicode Discussion
> From: Philippe Verdy via Unicode
>
> Your "standard compliant" plain text editor just forces a LTR default for the
> whole document, and does not
> tolerate that individual paragraphs may start with an undetermined
> Cc: unicode-requ...@unicode.org
> Date: Sun, 18 Feb 2018 14:35:00 +0100
> From: "Janusz S. Bień via Unicode"
>
> As a Debian user using some rare characters for old Polish
> transliteration I would be happy with a tool which scans
> available/installed fonts for a specific
> Date: Thu, 15 Feb 2018 17:33:12 -0500
> From: Oren Watson via Unicode
>
> https://securelist.com/zero-day-vulnerability-in-telegram/83800/
>
> You could disallow these characters in filenames, but when filename handling
> is charset-agnostic due to the
> extended-ascii
> Date: Fri, 22 Dec 2017 15:36:35 +
> From: Richard Wordingham via Unicode
>
> Emacs is civilised in that it allows one to delete character by
> character from either end. That may, however, require some
> intelligence on the part of the user so that they don't get
> Date: Thu, 21 Dec 2017 22:04:37 -0800
> Cc: Unicode Public
> From: Manish Goregaokar via Unicode
>
> However, Firefox deletes by code point.
As does Emacs, btw.
> Date: Wed, 11 Oct 2017 22:01:32 +0100
> From: Richard Wordingham via Unicode
>
> The description I had found undersold the noble intention.
If you mean that the documentation doesn't describe the feature well
enough, I'd welcome a documentation bug report.
>
> Date: Tue, 10 Oct 2017 21:51:55 +0100
> From: Richard Wordingham via Unicode
>
> > Emacs lately introduced character-folding in searches, but it's turned
> > off by default, as many users objected.
>
> I don't see how that helps with this problem. If I search for the
>
> Date: Sat, 26 Aug 2017 22:07:57 +0100
> From: Richard Wordingham via Unicode
>
> > We are miscommunicating. My point was that programming for MS-Windows
> > needs a good understanding of what the UTF-16 surrogates are, and in
> > what MS-Windows APIs/library functions
> Date: Sat, 26 Aug 2017 18:52:03 +0100
> From: Richard Wordingham via Unicode
>
> > > It shouldn't. UTF-16 works just like UTF-8, except that the code
> > > units are bigger.
>
> > Not really, since UTF-8 doesn't have surrogates.
>
> It has 115 surrogates, thoroughly
> Date: Sat, 26 Aug 2017 16:09:33 +0100
> From: Richard Wordingham via Unicode
>
> > > Just steer them away from UTF-16!
> >
> > Which will leave them entirely unprepared for the MS-Windows Unicode
> > programming, something they of course will never need in their
> >
> Date: Fri, 25 Aug 2017 00:23:40 +0100
> From: Richard Wordingham via Unicode
>
> On Thu, 24 Aug 2017 17:17:10 +
> Andre Schappo via Unicode wrote:
>
> > So, I consider it important to familiarise students with SMP
> > characters as well as BMP
> Date: Sun, 16 Jul 2017 07:13:02 +0300
> From: Dov Grobgeld via Unicode
>
> While implementing UAX#9 for Unicode 6.3 (and beyond) in FriBidi, I'm trying
> to pass all the tests of
> BidiCharacterTest.txt , and I'm having problem understanding a few of the
> tests that to
> Date: Sat, 1 Jul 2017 16:36:52 +0300
> From: Itai Berli via Unicode
>
> Emacs claims to fully conform to the Unicode Bidirectional Algorithm
> 8.0.0 (see sections 22.19 'Bidirectional Editing' and 37.26
> 'Bidirectional Display' of the Emacs manual)
This is somewhat
> Date: Sat, 1 Jul 2017 16:36:52 +0300
> From: Itai Berli via Unicode
>
> Emacs claims to fully conform to the Unicode Bidirectional Algorithm
> 8.0.0 (see sections 22.19 'Bidirectional Editing' and 37.26
> 'Bidirectional Display' of the Emacs manual), yet I have noticed
> Date: Wed, 26 Apr 2017 07:45:07 +0100
> From: Richard Wordingham via Unicode <unicode@unicode.org>
>
> On Wed, 26 Apr 2017 08:48:13 +0300
> Eli Zaretskii via Unicode <unicode@unicode.org> wrote:
>
> > > Date: Sun, 23 Apr 2017 22:59:49 +0100
> >
> Date: Sun, 23 Apr 2017 22:59:49 +0100
> From: Richard Wordingham
> Cc: Eli Zaretskii
>
> If I search for CGJ, highlighting it is frequently supremely useless.
> I want to know where it is; highlighting is merely a tool to find it on
> the screen.
> Date: Sun, 23 Apr 2017 00:51:59 +0100
> Cc: Julian Bradfield <jcb+unic...@inf.ed.ac.uk>
> From: Richard Wordingham via Unicode <unicode@unicode.org>
>
> On Sat, 22 Apr 2017 21:39:42 +0100 (BST)
> Julian Bradfield via Unicode <unicode@unicode.org> wrote:
>
> Date: Sat, 22 Apr 2017 17:13:36 +0100
> From: Richard Wordingham via Unicode
>
> > Movement by grapheme
> > cluster is AFAIK the most natural way of moving in complex scripts.
>
> Evidence?
Personal experience?
> It's easiest for displaying the cursor.
It's the _only_
> Date: Sat, 22 Apr 2017 11:13:16 +0100
> From: Richard Wordingham via Unicode
>
> At present these are split into two and three grapheme clusters
> respectively, and LibreOffice cursor movement responds accordingly.
> (SIGN AA starts a grapheme cluster in several scripts of
100 matches
Mail list logo