2017-04-11 15:04 GMT+02:00 Kent Karlsson via Unicode :
>
> Den 2017-04-10 12:19, skrev "Michael Everson" :
>
> > I believe the box drawing characters are for drawing boxes
>
> Which is exactly what you are doing.
>
> > and grids on
> > computer
2017-04-12 6:12 GMT+02:00 Garth Wallace <gwa...@gmail.com>:
> On Tue, Apr 11, 2017 at 8:44 AM, Philippe Verdy via Unicode <
> unicode@unicode.org> wrote:
>
>>
>>
>> 2017-04-11 15:04 GMT+02:00 Kent Karlsson via Unicode <unicode@unicode.org
>> &g
2017-04-12 8:35 GMT+02:00 Martin J. Dürst <due...@it.aoyama.ac.jp>:
> On 2017/04/12 00:44, Philippe Verdy via Unicode wrote:
>
> Some Asian chess boards include also diagonal lines or dots on top of their
>> crossing (notably 9x9 boards are subdivided into nine 3x3 subgrou
2017-04-12 15:48 GMT+02:00 Julian Bradfield via Unicode <unicode@unicode.org
>:
> On 2017-04-12, Philippe Verdy via Unicode <unicode@unicode.org> wrote:
> > 2017-04-12 8:35 GMT+02:00 Martin J. Dürst <due...@it.aoyama.ac.jp>:
> >> On Go boards, the grid cells ar
2017-04-11 0:10 GMT+02:00 Aleksey Tulinov :
> It's probably this link: http://unicode.org/standard/Un
> icodeTranscriptions.html
This page is hard to find, I didn't know where it was linked from until I
saw it (referenced by "What is Unicode?")
I do agree, only CJK fonts using in CJK contexts will render them as "W"
(i.e. the fixed-width srandard ideogaphic composition square). If they are
used in Latin, they will adopt the metrics of the Latin font including
them, thery will be square but not necessarily aligned with the ideographic
2017-07-07 19:02 GMT+02:00 Doug Ewell via Unicode :
> Oracle FAQ:
> While UTF8 uses only 2 bytes to store data AL32UTF8 uses 2 or 4 bytes.
>
> Unicode and UTF-8 have been around a long time by now. The fact that
> there is still fake news like this out there, steering our
2017-07-17 14:25 GMT+02:00 Christoph Päper via Unicode
:
>
> Finally, should smart fonts make U+0020 exactly as wide as an em when
> between two emojis?
>
Really I don't think so, Emojis are not specific to East-Asian use even if
a significant part of them come from there.
As well the feminine form of the common adjective "ambigu" has been
"regularized" to place the diaeresis ("tréma" in French) on the pronounced
u rather than an on the mute e added for the regular feminine "ambigüe": it
also correctly forces the pronunciation of this u, which would otherwise be
That's another argument to deprecate the use of RLE/PDF (or embedding mode)
in favor of the more recent isolating mode (which causes the text just
after the isolated text to not inherit the direction context of the last
inner content, as it occurs here with parentheses that cannot match the
same
Also note that the maximum line-length in that RFC is a SHOULD and not a
MUST. This is intended to give a reasonable hint for the limit used in
implementations that process data in the given format: The RFC suggests a
maximum line length of 75 "characters", excluding the CRLF+SPACE
continuation
But at the same time that RFC makes a direct reference as UTF-8 as being
the default charset, so an implementation of the RFC cannot be agnostic to
what is UTF-8 and will not break in the middle of a conforming UTF-8
sequence.
When the limit is reached, that implementations knows that it cannot
2017-07-24 21:12 GMT+02:00 J Decker via Unicode :
>
>
> On Mon, Jul 24, 2017 at 10:57 AM, Costello, Roger L. via Unicode <
> unicode@unicode.org> wrote:
>
>> Hi Folks,
>>
>> 2. (Bug) The sending application performs the folding process - inserts
>> CRLF plus white space
2017-07-24 22:50 GMT+02:00 Philippe Verdy :
> 2017-07-24 21:12 GMT+02:00 J Decker via Unicode :
>
>>
>>
>> On Mon, Jul 24, 2017 at 10:57 AM, Costello, Roger L. via Unicode <
>> unicode@unicode.org> wrote:
>>
>>> Hi Folks,
>>>
>>> 2. (Bug) The sending
2017-07-25 0:35 GMT+02:00 Doug Ewell via Unicode :
> J Decker wrote:
>
> > I generally accepted any utf-8 encoding up to 31 bits though ( since
> > I was going from the original spec, and not what was effective limit
> > based on unicode codepoint space)
>
> Hey, everybody:
True but this only applies to "simple case mappings" (those in the main
datatase), not to extended mappings (which are locale dependant, such as
mappings for dotted and undotted i in Turkish).
So the extended mappings can perfectly be changed for German: they are not
part of the stability policy
g Arabic ligatures).
2017-08-18 14:21 GMT+02:00 Andre Schappo <a.scha...@lboro.ac.uk>:
>
> On 18 Aug 2017, at 00:50, Philippe Verdy via Unicode <unicode@unicode.org>
> wrote:
>
>
> 2017-08-17 18:46 GMT+02:00 Asmus Freytag (c) via Unicode <
> unicode@unicode.o
2017-08-17 18:46 GMT+02:00 Asmus Freytag (c) via Unicode <
unicode@unicode.org>:
> On 8/17/2017 7:47 AM, Philippe Verdy wrote:
>
> 2017-08-17 16:24 GMT+02:00 Mike FABIAN via Unicode :
>
>> Asmus Freytag via Unicode さんはかきました:
>> Most emoji now have "W",
lity, it is recommended that
> this practice be continued with current and future emoji. They will
> typically have about the same vertical placement and advance width as CJK
> ideographs.'
>
> - Peter E
>
> On Aug 18, 2017, at 1:48 PM, Philippe Verdy via Unicode <
> unicode@unicode.org>
2017-08-17 16:24 GMT+02:00 Mike FABIAN via Unicode :
> Asmus Freytag via Unicode さんはかきました:
> Most emoji now have "W", for example:
>
> 1F600..1F64F;W # So[80] GRINNING FACE..PERSON WITH FOLDED HANDS
>
> That seems correct because emoji behave more
Consider also that the BMP is almost full, the remaining few holes are kept
for isolated characters that may be added to existing scripts, or
permanently reserved to avoid clashes with legacy softwares using simple
code remappings between distinct blocks, or to perform simple case
conversions
2017-05-03 9:49 GMT+02:00 Richard Wordingham via Unicode <
unicode@unicode.org>:
> On Tue, 2 May 2017 05:08:27 +0200
> Philippe Verdy via Unicode <unicode@unicode.org> wrote:
>
> > Consider also that the BMP is almost full, the remaining few holes
> > are kept
I find intriguating that the update intends to enforce the decoding of the
**shortest** sequences, but now wants to treat **maximal sequences** as a
single unit with arbitrary length. UTF-8 was designed to work only with
some state machines that would NEVER need to parse more than 4 bytes.
For
2017-05-16 15:23 GMT+02:00 Hans Åberg :
> All current filsystems, as far as experts could recall, use octet
> sequences at the lowest level; whatever encoding is used is built in a
> layer above
>
Not NTFS (on Windows) which uses sequences of 16bit units. Same about
2017-05-16 14:44 GMT+02:00 Hans Åberg via Unicode :
>
> > On 15 May 2017, at 12:21, Henri Sivonen via Unicode
> wrote:
> ...
> > I think Unicode should not adopt the proposed change.
>
> It would be useful, for use with filesystems, to have Unicode
2017-05-16 19:30 GMT+02:00 Shawn Steele via Unicode :
> C) The data was corrupted by some other means. Perhaps bad
> concatenations, lost blocks during read/transmission, etc. If we lost 2
> 512 byte blocks, then maybe we should have a thousand FFFDs (but how would
> we
On Windows NTFS (and LFN extension of FAT32 and exFAT) at least, random
sequences of 16-bit code units are not permitted. There's visibly a
validation step that returns an error if you attempt to create files with
invalid sequences (including other restrictions such as forbidding U+
and some
2017-05-16 12:40 GMT+02:00 Henri Sivonen via Unicode :
> > One additional note: the standard codifies this behaviour as a
> *recommendation*, not a requirement.
>
> This is an odd argument in favor of changing it. If the argument is
> that it's just a recommendation that you
2017-05-15 19:54 GMT+02:00 Asmus Freytag via Unicode :
> I think this political reason should be taken very seriously. There are
> already too many instances where ICU can be seen "driving" the development
> of property and algorithms.
>
> Those involved in the ICU project
Softwares designed with only UCS-2 and not real UTF-16 support are still
used today
For example MySQL with its broken "UTF-8" encoding which in fact encodes
supplementary characters as two separate 16-bit code-units for surrogates,
each one blindly encoded as 3-byte sequences which would be
>
> The proposal actually does cover things that aren’t structurally valid,
> like your e0 e0 e0 example, which it suggests should be a single U+FFFD
> because the initial e0 denotes a three byte sequence, and your 80 80 80
> example, which it proposes should constitute three illegal subsequences
2017-05-23 8:43 GMT+02:00 Asmus Freytag via Unicode :
> On 5/22/2017 3:49 PM, Richard Wordingham via Unicode wrote:
>
>> One of the objectives is to use a current version of the UCD to
>> determine, for example, which characters were in Version x.y. One
>> needs that for a
>
> Citing directly from the PRI:
>
>
> The term "maximal subpart of the ill-formed subsequence" refers to the
> longest potentially valid initial subsequence or, if none, then to the next
> single code unit.
>
>
The way i understand it is that C0 80 will have TWO maximal subparts,
Another alternative for you API is to not return simple integer values, but
return (read-only) instances of a Char32 class whose "scalar" property
would normally be a valid codepoint with scalar value, or whose "string"
property will be the actual character; but with another static property
2017-05-16 20:50 GMT+02:00 Shawn Steele :
> But why change a recommendation just because it “feels like”. As you
> said, it’s just a recommendation, so if that really annoyed someone, they
> could do something else (eg: they could use a single FFFD).
>
>
>
> If the
But will there really be a new era name with the new emperor? All that
could be made is a preservation by principle, but this does not mean that
it will be really encoded. The lack of a "representative glyph" is a
blocker.
May be we could add instead a generic character for "New Japanese Era"
Anyway, since emperor Akihito (明仁), the era starting in 1989 is no longer
named after the emperor, but is Heisei (平成) "Peace everywhere". This
already occured in the past on the Ningo system. There's no absolute
requirement to change the era name even if there's a new Emperor named.
Anyway it is
This is still very unlikely to occur. Lot of discussions about emojis but
they still don't count a lot in the total.
The major updates were epected for CJK sinograms, but even the rate of
updates has slowed down and we will eventually will have another
sinographic plane, but it will not come soon
These old platforms still have their fans which are easily found on socail
networks. There's even an active market of designs and extensions with new
products being made by them, and sold online. Some Fablabs are using them
because of the ease they can be modified/tweaked. The Commodire 64
2017-04-29 21:21 GMT+02:00 Naena Guru via Unicode :
> Just about the name paluta:
> In Sanskrit, the length of vowels are measured in maaþra (a cognate of the
> word 'meter'). It is the spoken length of a short vowel. In Latin it is
> termed mora. Usually, you have only
rules for selecting the most appropriate fonts. Adn
then it's much easier to update only one of these fonts when there are
improvements, without breaking all the rest.
2017-05-04 9:26 GMT+02:00 Richard Wordingham via Unicode <
unicode@unicode.org>:
> On Thu, 4 May 2017 05:01:17 +0200
> Ph
n that page would
> best be done with a combining macron, I think.
>
> --Ken
>
> On 9/26/2017 6:34 AM, Philippe Verdy via Unicode wrote:
>
> But what is interesting is the use of negative digits (-1 to -9, with the
> minus sign above the digit; I've not seen a case of minus 0
2017-09-26 17:45 GMT+02:00 Ken Whistler via Unicode :
> Leo,
>
> Yeah, I know. My point was that by examining the physical typewriter keys
> (the striking head on the typebar, not the images on the keypads), one
> could see what could be generated *by* overstriking. I think
This is what is printed in the manual by its editor that probably used
metalic fonts, however I doubt the actual typewriter had this symbol on the
wheel of hammers, and it was probably just overtriking the two letters X
and I.
2017-09-26 15:03 GMT+02:00 John W Kennedy via Unicode
But what is interesting is the use of negative digits (-1 to -9, with the
minus sign above the digit; I've not seen a case of minus 0, not needed
apparently by the described operations)
How do you encode these negative decimal digits in Unicode ? with a macron
diacritic ?
2017-09-26 15:20
But it is not the case for this early computer, whose typewriter terminal
is clearly using non-interchangeable font balls but old metalic type on a
"wheel of hammers".
It's clearly also that this is not that typerwriter (described in the
munalk) that was used to typeset the manual using more
2017-08-24 19:17 GMT+02:00 Andre Schappo via Unicode :
>
> Because there are many systems that can now handle BMP characters but not
> cannot handle SMP characters.
>
> One example being systems that use mysql utf8 (3 byte encoding) and have
> not yet updated to utf8mb4 (4
2017-08-17 22:37 GMT+02:00 Richard Wordingham via Unicode <
unicode@unicode.org>:
> Thus, at the level of undisputable text, in Indic scripts there appears
> to be no provision for the ordering of multiple left matras that are
> to be stored in logical order (i.e. backing order) after the onset
>
2017-08-26 21:28 GMT+02:00 Richard Wordingham via Unicode <
unicode@unicode.org>:
>
> I'm wondering if there are any cases where a SHY _should_ go between a
> Latin letter and diacritic. I can't think of any.
>
In standard Latin orthography you would not expect it, normally, but there
will be
2017-08-27 6:06 GMT+02:00 Richard Wordingham via Unicode <
unicode@unicode.org>:
> On Sat, 26 Aug 2017 21:52:19 +0200
> Philippe Verdy via Unicode <unicode@unicode.org> wrote:
>
> > 2017-08-26 21:28 GMT+02:00 Richard Wordingham via Unicode <
> > unico
ve encoding of many emojos (now with very long
sequences for groups of people which also include their own complex
placement rules)
2017-08-28 4:40 GMT+02:00 Richard Wordingham via Unicode <
unicode@unicode.org>:
> On Sun, 27 Aug 2017 19:55:31 +0200
> Philippe Verdy via Unicode <unicode
other interesting combinations:
- = parasol
- = parapluie
- = sun glasses
- = parafoudre
Note that a "combining" shadow is not absolutely necessary, but I don't how
a shadow can exist with the object creating it and giving its form to the
shadow.
Why this distinction with the left oright side on which you'll place the
"half moon" (which "half moon" when eclipses actually occur either on full
moons or new moons?) and the Sun ???
Note that Solar eclipses occur normally during the day at places where they
are observable, but not necessarily
It could appear as a supplementary chart for the ISCII standard, but when
converting to Unicode, it should have no impact except possibly encoding
some of their letters in the new chart as pairs of Unicode characters even
if one of them would not be necessary in all contexts (it could be a
variant
Strings in Java and JavaScript are basically the same as they are arbitrary
sequences of 16-bit code units, and not restricted to text with valid
UTF-16 encoding. The differences are in the set of access methods, but they
are both normally immutable, and both allow (but do enforce) substrings to
continuousbuilds may just check the statue of the short shasums files to
know when one has changed, this would not use lot of bandwidth. Anyway if
your website supports HTTP mime requests for conditional downloads , or if
clients are using HEAD ratrher than GET requests to get metadata, this
saves
Any font would likely map the space (and probably for any CJK font the
ideographic space). As well the newline don't need any font, it is
synthetized by renderers. This could be used to compose some Japanese-like
Aiku with some meaning...
2017-11-13 23:54 GMT+01:00 James Kass via Unicode
May be this test page ?
http://www.i18nguy.com/unicode/supplementary-test.html
2017-11-13 20:38 GMT+01:00 James Kass via Unicode :
> A font's sample text can be used in place of the default "The quick
> brown fox..." text which is used to illustrate the typeface in
>
2017-11-13 21:48 GMT+01:00 James Kass :
> Peter Constable wrote,
>
> >> May be this test page ?
> >>
> >> http://www.i18nguy.com/unicode/supplementary-test.html
> >
> > Thanks. I’d need to know _at least something_ about what the characters
> > signify, though, to have a
So this is effectively (custom HTML-like markup)
"Bäck-ker"
2017-11-10 4:11 GMT+01:00 Asmus Freytag via Unicode :
> On 11/9/2017 6:40 PM, Elias Mårtenson via Unicode wrote:
>
> On 9 November 2017 at 18:12, Walter Tross wrote:
>
>> Long story short:
2017-11-10 3:40 GMT+01:00 Elias Mårtenson via Unicode :
> On 9 November 2017 at 18:12, Walter Tross wrote:
>
>> Long story short: it's Abschlusssatz now (and Rollladen, etc.) One of the
>> criteria of the reform was to normalise hyphenation. This has
4:05+0100, Philippe Verdy via Unicode wrote:
> > The Armenian script has its own distinctive punctuation (vertsaket) for
> the
> > standard full stop at end of sentence (whose glyph looks very much like
> the
> > Basic Latin/ASCII colon, however slighly more bold and
reviation dots). The new encoded
mikajet may include a note suggesting the use of the MIDDLE DOT as a
preferable fallback.
2017-12-05 21:35 GMT+01:00 Asmus Freytag via Unicode <unicode@unicode.org>:
> On 12/5/2017 11:28 AM, Philippe Verdy via Unicode wrote:
>
> U+2024 is not suppo
n
> the Armenian block) as it also has to be distinguisdhed from leader dots in
> Armenian TOC, exactly like the vertsaket was distinguished at U+0589.
>
>
> 2017-12-05 19:59 GMT+01:00 S. Gilles <sgil...@math.umd.edu>:
>
>> On 2017-12-05T18:44:05+0100, Philippe Verdy via Uni
The Armenian script has its own distinctive punctuation (vertsaket) for the
standard full stop at end of sentence (whose glyph looks very much like the
Basic Latin/ASCII colon, however slighly more bold and slanted and whose
dots are rectangular). It is encoded at U+0589. And used in traditional
2017-12-09 15:28 GMT+01:00 Richard Wordingham via Unicode <
unicode@unicode.org>:
> Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0
> implies that it might be considered desirable to have a word boundary
> in 'aquaφοβία' or a grapheme cluster break in a coding such as <006C,
>
It could be argued that "modern" languages could use unique identifiers for
their syntax or API independantly of the name being rendered. The problem
is that translated names may collide in non-obvious way and become
ambiguous.
We've already seen the problems it caused in Excel with its translated
I just see the WG2 as a subcomity where governements may just check their
practices and make minimum recommendations. Most governements are in fact
very late to adopt the industry standards that evolve fast, and they just
want to reduce the frequency of necessary changes jsut to enterinate what
2018-06-09 17:22 GMT+02:00 Marcel Schneider via Unicode :
> On Sat, 9 Jun 2018 09:47:01 +0100, Richard Wordingham via Unicode wrote:
> >
> > On Sat, 9 Jun 2018 08:23:33 +0200 (CEST)
> > Marcel Schneider via Unicode wrote:
> >
> > > > Where there is opportunity for productive sync and merging
If you intend to allow all the standard orthography of common languages,
you would also need to support apostrophes and regular hyphens in
identifiers, including those from ASCII !
The Catalan middle dot is just a compact variant of the hyphen, it should
have better been a diacritic, but the
2018-06-08 19:41 GMT+02:00 Richard Wordingham via Unicode <
unicode@unicode.org>:
> On Fri, 8 Jun 2018 13:40:21 +0200
> Mark Davis ☕️ wrote:
>
> > Mark
> >
> > On Fri, Jun 8, 2018 at 10:06 AM, Richard Wordingham via Unicode <
> > unicode@unicode.org> wrote:
> >
> > > On Fri, 8 Jun 2018 05:32:51
2018-06-07 21:13 GMT+02:00 Marcel Schneider via Unicode :
> On Thu, 17 May 2018 22:26:15 +, Peter Constable via Unicode wrote:
> […]
> > Hence, from an ISO perspective, ISO 10646 is the only standard for which
> on-going
> > synchronization with Unicode is needed or relevant.
>
> This point
CJK-specific letter forms for these abbreviations/units should be left as
is. They are kept for compatibility reason and I don't see a reason to
change them to upright which would contradict their legacy usage. The SI
brochure does not apply to these legacy square presentations (which would
be
Even flat notes or rythmic and pause symbols in Western musical notations
have different contextual meaning depending on musical keys at start of
scores, and other notations or symbols added above the score. So their
interpretation are also variable according to context, just like tuning in
a
In my opinion the usual constant is most often shown as "휋" (curly
serifs, slightly slanted) in mathematical articles and books (and in TeX),
but rarely as "π" (sans-serif).
There's a tradition of using handwriting for this symbol on backboards (not
always with serifs, but still often slanted).
Isn't it a rounded variant of Latin letter n ? Then it could exist also in
uppercase form (like "n" and "N")
It could also be used as a spacing version of the combining tilde
diacritic, to be written after the letter instead of being combined above
it (so "el Niño" would we written with it as "el
Well it's unfortunate that Microsoft's own response (by its MSVP) is
completely wrong, suggesting to use Narrow non-breaking space to get
justification, which is exactly the reverse where these NNBSP should NOT be
justified and keep their width.
Microsoft's developers have absolutely
These are ISO 15924 script codes for script variants or groups of related
scripts, not used in Unicode classification of characters due to their
unification (even if there are registered variants for them)
2017-12-22 1:18 GMT+01:00 Karl Williamson via Unicode :
> In
If you don't know what to do with your books (any kind), go to your local
public library to give it there, or give it to a school, they may interest
students. Such books are rarely found in primary schools but this may
insterest them to get some supports and the earlier versions are simpler to
2018-01-11 6:35 GMT+01:00 Pierpaolo Bernardi via Unicode <
unicode@unicode.org>:
> On Thu, Jan 11, 2018 at 4:44 AM, jillian mestel via Unicode
> wrote:
> > To whom it may concern,
> > I was very disappointed to learn that there are no emojis of portraying
> a dominant left
Well I can think of a popular pseudo-planet, the "Death Star" or "Black
Star" (for the "Star Wars" series), which is easily recognized by its color
and shape (with the deep built crater, and optionally its destroyed half
part) which also looks like a real planet, the Saturnian moon Mimas with
its
Hmmm that character exists already at 0+0315 (a combining comma above
right). It would work for the new Kazah orthographic system, including for
collation purpose. I don't think IDN rejects this combining version.
2018-01-19 14:37 GMT+01:00 Philippe Verdy :
> May be the
Also U+0315 is not part of any decomposition for canonical normalization
purpose, so it would remain encoded separately (only subject to possible
reordering if there are other diacritics)
2018-01-19 14:37 GMT+01:00 Philippe Verdy :
> May be the IDN could accept a new
2018-01-19 14:47 GMT+01:00 Michael Everson via Unicode
:
> There’s no redeeming this orthography.
This is not a redeeming, the Kazakh government currently has not made any
assesment of how to encode their proposed system.
Who said that was was proposed by them was an
May be the IDN could accept a new combining diacritic (sort of right-side
acute accent). After all the Kazakh intent is not to define a new separate
character but a modification of base letter to create a single letter in
their alphabet.
So a proposal for COMBINING APOSTROPHE (whose spacing
For the root zone may be, but not formally rejected by IDN, and the Kazakh
zone could accept it without problem. It also has the advantage of allowing
cleaner collation and contextual text extraction, and it also allows better
placement of the combining character with its base in some dedicated
punctuation sign for
quotation...)
2018-01-20 21:04 GMT+01:00 Simon Montagu via Unicode <unicode@unicode.org>:
> On 19/01/18 15:37, Philippe Verdy via Unicode wrote:
> > May be the IDN could accept a new combining diacritic (sort of
> > right-side acute accent). After
So there will be a new administrative jargon in Kazakhstan that people
won't like, and outside the government, they'll continue using their
exiosting keyboards, and will only trnasliterate to Latin using a simple
1-t-to-1 mapping without the ugly apostrophes (most probably acute accents
on vowels,
Great but then why sticking on a pure western subset (ASCII is mostly for
US only). If he wants to be eastern, so choose ISO 8859-2.
As a bonus, banning the apostrophe from the alphabet will have be security
improvement (thing about the many cases where ASCII apostrophes are used as
string
Such example shows that ignoring umlauts makes the document
counterintuitive. Nobody is able to infer that "Paper" refers to a person
here or if he actually meant a paper sheet/article...
At least he should have written "Paeper" which would be more correct (if
"Christoph Päper" is German, the
Just a remark for fun:
- You'll also note that this talk is all about the apostrophe, and if
Kazakhstan wants to introduce it in 2019, that year will match exactly the
code point U+2019 [ ’ ]...
- This year 2018 is also the year to discuss and reverse the apostrophe
decision, and it matches the
I agree, and still you won't necessarily have to press a dead key to have
these characters, if you map one key where the Cyrillic letter was
producing directly the character with its accent.
No surprise for user, fast to type, easy to learn, typographically correct,
preserves the etymologies and
2018-01-28 23:44 GMT+01:00 Richard Wordingham via Unicode <
unicode@unicode.org>:
> On Sun, 28 Jan 2018 20:29:28 +0100
> Philippe Verdy via Unicode <unicode@unicode.org> wrote:
>
> > 2018-01-28 5:12 GMT+01:00 Richard Wordingham via Unicode <
> > unicode@unicode
ass, and matches only "ab", "ba", "ac", or
"ca", it is equivalent to "{{2,2}a|b|c}" or "{{2}a|b|c}".
With that extension you can build the necessary regexps to match canonical
equivalent strings with a finite regexp.
2018-01-29 7:16 GM
2018-01-28 5:12 GMT+01:00 Richard Wordingham via Unicode <
unicode@unicode.org>:
> On Sat, 27 Jan 2018 14:13:40 -0800The theory
> of regular expressions (though you may not think that mathematical
> regular expressions matter) extends to trace monoids, with the
> disturbing exception that the
Typo, the full regexp has undesired asterisks:
[[ [^[[:cc=0:]]] - [[:cc=above:][:cc=below:]] ]] *
( [[ [^[[:cc=0:]]] - [[:cc=above:][:cc=below:]] ]]
*
| [[ [^[[:cc=0:]]] - [[:cc=above:][:cc=below:]] ]]
* < COMBINING CIRCUMFLEX>
2018-01-28 20:29 GMT+01:00 Philippe Verdy
Note that for finding occurence of simpler combining sequences such
as finding the regexp is simpler:
[[ [^[[:cc=0:]]] - [[:cc=above:]] ]] *
The central character class allows 53 distinct combining classes, and the
maximum match length is 2+53=55 characters.
If Unicode assigns new combining
bc]" and matches only "a", "b", or "c"
> And "{{0}[abc]}" is quantified to match zero and only zero item (the
> items are not relevant) and will never match anything, just like
> "{{0}a|b|c}" or "{{0}}".
> And "{{2}[ab
1-29 9:57 GMT+01:00 Richard Wordingham via Unicode <
unicode@unicode.org>:
> On Mon, 29 Jan 2018 07:16:04 +0100
> Philippe Verdy via Unicode <unicode@unicode.org> wrote:
>
> > 2018-01-28 23:44 GMT+01:00 Richard Wordingham via Unicode <
> > unicode@unicode.org>:
&
>
> Note the French "touch" keyboard layout is complete for French (provided
> you select the one of the 3 new layouts with Emoji: it has the extra "key"
> for selecting the input language in all 4 layouts)
>
> But the "full" (dockable) touch layout in French which emulates a physical
> keyboard
1 - 100 of 278 matches
Mail list logo