Re: Proposal to add standardized variation sequences for chess notation

2017-04-11 Thread Philippe Verdy via Unicode
2017-04-11 15:04 GMT+02:00 Kent Karlsson via Unicode : > > Den 2017-04-10 12:19, skrev "Michael Everson" : > > > I believe the box drawing characters are for drawing boxes > > Which is exactly what you are doing. > > > and grids on > > computer

Re: Proposal to add standardized variation sequences for chess notation

2017-04-11 Thread Philippe Verdy via Unicode
2017-04-12 6:12 GMT+02:00 Garth Wallace <gwa...@gmail.com>: > On Tue, Apr 11, 2017 at 8:44 AM, Philippe Verdy via Unicode < > unicode@unicode.org> wrote: > >> >> >> 2017-04-11 15:04 GMT+02:00 Kent Karlsson via Unicode <unicode@unicode.org >> &g

Re: Proposal to add standardized variation sequences for chess notation

2017-04-12 Thread Philippe Verdy via Unicode
2017-04-12 8:35 GMT+02:00 Martin J. Dürst <due...@it.aoyama.ac.jp>: > On 2017/04/12 00:44, Philippe Verdy via Unicode wrote: > > Some Asian chess boards include also diagonal lines or dots on top of their >> crossing (notably 9x9 boards are subdivided into nine 3x3 subgrou

Re: Proposal to add standardized variation sequences for chess notation

2017-04-12 Thread Philippe Verdy via Unicode
2017-04-12 15:48 GMT+02:00 Julian Bradfield via Unicode <unicode@unicode.org >: > On 2017-04-12, Philippe Verdy via Unicode <unicode@unicode.org> wrote: > > 2017-04-12 8:35 GMT+02:00 Martin J. Dürst <due...@it.aoyama.ac.jp>: > >> On Go boards, the grid cells ar

Re: Unicode vs. Unikod

2017-04-11 Thread Philippe Verdy via Unicode
2017-04-11 0:10 GMT+02:00 Aleksey Tulinov : > It's probably this link: http://unicode.org/standard/Un > icodeTranscriptions.html This page is hard to find, I didn't know where it was linked from until I saw it (referenced by "What is Unicode?")

Re: Should U+3248 ... U+324F be wide characters?

2017-08-16 Thread Philippe Verdy via Unicode
I do agree, only CJK fonts using in CJK contexts will render them as "W" (i.e. the fixed-width srandard ideogaphic composition square). If they are used in Latin, they will adopt the metrics of the Latin font including them, thery will be square but not necessarily aligned with the ideographic

Re: Unicode education in the professional world

2017-07-07 Thread Philippe Verdy via Unicode
2017-07-07 19:02 GMT+02:00 Doug Ewell via Unicode : > Oracle FAQ: > While UTF8 uses only 2 bytes to store data AL32UTF8 uses 2 or 4 bytes. > > Unicode and UTF-8 have been around a long time by now. The fact that > there is still fake news like this out there, steering our

Re: Emoji Space

2017-07-18 Thread Philippe Verdy via Unicode
2017-07-17 14:25 GMT+02:00 Christoph Päper via Unicode : > > Finally, should smart fonts make U+0020 exactly as wide as an em when > between two emojis? > Really I don't think so, Emojis are not specific to East-Asian use even if a significant part of them come from there.

Re: Unicode education in UK Schools

2017-07-15 Thread Philippe Verdy via Unicode
As well the feminine form of the common adjective "ambigu" has been "regularized" to place the diaeresis ("tréma" in French) on the pronounced u rather than an on the mute e added for the regular feminine "ambigüe": it also correctly forces the pronunciation of this u, which would otherwise be

Re: Problems with BidiCharTest.txt

2017-07-16 Thread Philippe Verdy via Unicode
That's another argument to deprecate the use of RLE/PDF (or embedding mode) in favor of the more recent isolating mode (which causes the text just after the isolated text to not inherit the direction context of the last inner content, as it occurs here with parentheses that cannot match the same

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Philippe Verdy via Unicode
Also note that the maximum line-length in that RFC is a SHOULD and not a MUST. This is intended to give a reasonable hint for the limit used in implementations that process data in the given format: The RFC suggests a maximum line length of 75 "characters", excluding the CRLF+SPACE continuation

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Philippe Verdy via Unicode
But at the same time that RFC makes a direct reference as UTF-8 as being the default charset, so an implementation of the RFC cannot be agnostic to what is UTF-8 and will not break in the middle of a conforming UTF-8 sequence. When the limit is reached, that implementations knows that it cannot

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Philippe Verdy via Unicode
2017-07-24 21:12 GMT+02:00 J Decker via Unicode : > > > On Mon, Jul 24, 2017 at 10:57 AM, Costello, Roger L. via Unicode < > unicode@unicode.org> wrote: > >> Hi Folks, >> >> 2. (Bug) The sending application performs the folding process - inserts >> CRLF plus white space

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Philippe Verdy via Unicode
2017-07-24 22:50 GMT+02:00 Philippe Verdy : > 2017-07-24 21:12 GMT+02:00 J Decker via Unicode : > >> >> >> On Mon, Jul 24, 2017 at 10:57 AM, Costello, Roger L. via Unicode < >> unicode@unicode.org> wrote: >> >>> Hi Folks, >>> >>> 2. (Bug) The sending

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Philippe Verdy via Unicode
2017-07-25 0:35 GMT+02:00 Doug Ewell via Unicode : > J Decker wrote: > > > I generally accepted any utf-8 encoding up to 31 bits though ( since > > I was going from the original spec, and not what was effective limit > > based on unicode codepoint space) > > Hey, everybody:

Re: LATIN CAPITAL LETTER SHARP S officially recognized

2017-06-30 Thread Philippe Verdy via Unicode
True but this only applies to "simple case mappings" (those in the main datatase), not to extended mappings (which are locale dependant, such as mappings for dotted and undotted i in Turkish). So the extended mappings can perfectly be changed for German: they are not part of the stability policy

Re: Should U+3248 ... U+324F be wide characters?

2017-08-18 Thread Philippe Verdy via Unicode
g Arabic ligatures). 2017-08-18 14:21 GMT+02:00 Andre Schappo <a.scha...@lboro.ac.uk>: > > On 18 Aug 2017, at 00:50, Philippe Verdy via Unicode <unicode@unicode.org> > wrote: > > > 2017-08-17 18:46 GMT+02:00 Asmus Freytag (c) via Unicode < > unicode@unicode.o

Re: Should U+3248 ... U+324F be wide characters?

2017-08-17 Thread Philippe Verdy via Unicode
2017-08-17 18:46 GMT+02:00 Asmus Freytag (c) via Unicode < unicode@unicode.org>: > On 8/17/2017 7:47 AM, Philippe Verdy wrote: > > 2017-08-17 16:24 GMT+02:00 Mike FABIAN via Unicode : > >> Asmus Freytag via Unicode さんはかきました: >> Most emoji now have "W",

Re: Should U+3248 ... U+324F be wide characters?

2017-08-19 Thread Philippe Verdy via Unicode
lity, it is recommended that > this practice be continued with current and future emoji. They will > typically have about the same vertical placement and advance width as CJK > ideographs.' > > - Peter E > > On Aug 18, 2017, at 1:48 PM, Philippe Verdy via Unicode < > unicode@unicode.org>

Re: Should U+3248 ... U+324F be wide characters?

2017-08-17 Thread Philippe Verdy via Unicode
2017-08-17 16:24 GMT+02:00 Mike FABIAN via Unicode : > Asmus Freytag via Unicode さんはかきました: > Most emoji now have "W", for example: > > 1F600..1F64F;W # So[80] GRINNING FACE..PERSON WITH FOLDED HANDS > > That seems correct because emoji behave more

Re: How to Add Beams to Notes

2017-05-01 Thread Philippe Verdy via Unicode
Consider also that the BMP is almost full, the remaining few holes are kept for isolated characters that may be added to existing scripts, or permanently reserved to avoid clashes with legacy softwares using simple code remappings between distinct blocks, or to perform simple case conversions

Re: How to Add Beams to Notes

2017-05-03 Thread Philippe Verdy via Unicode
2017-05-03 9:49 GMT+02:00 Richard Wordingham via Unicode < unicode@unicode.org>: > On Tue, 2 May 2017 05:08:27 +0200 > Philippe Verdy via Unicode <unicode@unicode.org> wrote: > > > Consider also that the BMP is almost full, the remaining few holes > > are kept

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Philippe Verdy via Unicode
I find intriguating that the update intends to enforce the decoding of the **shortest** sequences, but now wants to treat **maximal sequences** as a single unit with arbitrary length. UTF-8 was designed to work only with some state machines that would NEVER need to parse more than 4 bytes. For

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Philippe Verdy via Unicode
2017-05-16 15:23 GMT+02:00 Hans Åberg : > All current filsystems, as far as experts could recall, use octet > sequences at the lowest level; whatever encoding is used is built in a > layer above > Not NTFS (on Windows) which uses sequences of 16bit units. Same about

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Philippe Verdy via Unicode
2017-05-16 14:44 GMT+02:00 Hans Åberg via Unicode : > > > On 15 May 2017, at 12:21, Henri Sivonen via Unicode > wrote: > ... > > I think Unicode should not adopt the proposed change. > > It would be useful, for use with filesystems, to have Unicode

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Philippe Verdy via Unicode
2017-05-16 19:30 GMT+02:00 Shawn Steele via Unicode : > C) The data was corrupted by some other means. Perhaps bad > concatenations, lost blocks during read/transmission, etc. If we lost 2 > 512 byte blocks, then maybe we should have a thousand FFFDs (but how would > we

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Philippe Verdy via Unicode
On Windows NTFS (and LFN extension of FAT32 and exFAT) at least, random sequences of 16-bit code units are not permitted. There's visibly a validation step that returns an error if you attempt to create files with invalid sequences (including other restrictions such as forbidding U+ and some

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Philippe Verdy via Unicode
2017-05-16 12:40 GMT+02:00 Henri Sivonen via Unicode : > > One additional note: the standard codifies this behaviour as a > *recommendation*, not a requirement. > > This is an odd argument in favor of changing it. If the argument is > that it's just a recommendation that you

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Philippe Verdy via Unicode
2017-05-15 19:54 GMT+02:00 Asmus Freytag via Unicode : > I think this political reason should be taken very seriously. There are > already too many instances where ICU can be seen "driving" the development > of property and algorithms. > > Those involved in the ICU project

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Philippe Verdy via Unicode
Softwares designed with only UCS-2 and not real UTF-16 support are still used today For example MySQL with its broken "UTF-8" encoding which in fact encodes supplementary characters as two separate 16-bit code-units for surrogates, each one blindly encoded as 3-byte sequences which would be

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Philippe Verdy via Unicode
> > The proposal actually does cover things that aren’t structurally valid, > like your e0 e0 e0 example, which it suggests should be a single U+FFFD > because the initial e0 denotes a three byte sequence, and your 80 80 80 > example, which it proposes should constitute three illegal subsequences

Re: Comparing Raw Values of the Age Property

2017-05-23 Thread Philippe Verdy via Unicode
2017-05-23 8:43 GMT+02:00 Asmus Freytag via Unicode : > On 5/22/2017 3:49 PM, Richard Wordingham via Unicode wrote: > >> One of the objectives is to use a current version of the UCD to >> determine, for example, which characters were in Version x.y. One >> needs that for a

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-26 Thread Philippe Verdy via Unicode
> > Citing directly from the PRI: > > > The term "maximal subpart of the ill-formed subsequence" refers to the > longest potentially valid initial subsequence or, if none, then to the next > single code unit. > > The way i understand it is that C0 80 will have TWO maximal subparts,

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Philippe Verdy via Unicode
Another alternative for you API is to not return simple integer values, but return (read-only) instances of a Char32 class whose "scalar" property would normally be a valid codepoint with scalar value, or whose "string" property will be the actual character; but with another static property

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Philippe Verdy via Unicode
2017-05-16 20:50 GMT+02:00 Shawn Steele : > But why change a recommendation just because it “feels like”. As you > said, it’s just a recommendation, so if that really annoyed someone, they > could do something else (eg: they could use a single FFFD). > > > > If the

Re: Encoding of character for new Japanese era name after Heisei

2017-06-02 Thread Philippe Verdy via Unicode
But will there really be a new era name with the new emperor? All that could be made is a preservation by principle, but this does not mean that it will be really encoded. The lack of a "representative glyph" is a blocker. May be we could add instead a generic character for "New Japanese Era"

Re: Encoding of character for new Japanese era name after Heisei

2017-06-02 Thread Philippe Verdy via Unicode
Anyway, since emperor Akihito (明仁), the era starting in 1989 is no longer named after the emperor, but is Heisei (平成) "Peace everywhere". This already occured in the past on the Ningo system. There's no absolute requirement to change the era name even if there's a new Emperor named. Anyway it is

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Philippe Verdy via Unicode
This is still very unlikely to occur. Lot of discussions about emojis but they still don't count a lot in the total. The major updates were epected for CJK sinograms, but even the rate of updates has slowed down and we will eventually will have another sinographic plane, but it will not come soon

Re: Looking for 8-bit computer designers

2017-06-14 Thread Philippe Verdy via Unicode
These old platforms still have their fans which are easily found on socail networks. There's even an active market of designs and extensions with new products being made by them, and sold online. Some Fablabs are using them because of the ease they can be modified/tweaked. The Commodire 64

Re: Tibetan Paluta

2017-04-30 Thread Philippe Verdy via Unicode
2017-04-29 21:21 GMT+02:00 Naena Guru via Unicode : > Just about the name paluta: > In Sanskrit, the length of vowels are measured in maaþra (a cognate of the > word 'meter'). It is the spoken length of a short vowel. In Latin it is > termed mora. Usually, you have only

Re: How to Add Beams to Notes

2017-05-04 Thread Philippe Verdy via Unicode
rules for selecting the most appropriate fonts. Adn then it's much easier to update only one of these fonts when there are improvements, without breaking all the rest. 2017-05-04 9:26 GMT+02:00 Richard Wordingham via Unicode < unicode@unicode.org>: > On Thu, 4 May 2017 05:01:17 +0200 > Ph

Re: IBM 1620 invalid character symbol

2017-09-26 Thread Philippe Verdy via Unicode
n that page would > best be done with a combining macron, I think. > > --Ken > > On 9/26/2017 6:34 AM, Philippe Verdy via Unicode wrote: > > But what is interesting is the use of negative digits (-1 to -9, with the > minus sign above the digit; I've not seen a case of minus 0

Re: IBM 1620 invalid character symbol

2017-09-26 Thread Philippe Verdy via Unicode
2017-09-26 17:45 GMT+02:00 Ken Whistler via Unicode : > Leo, > > Yeah, I know. My point was that by examining the physical typewriter keys > (the striking head on the typebar, not the images on the keypads), one > could see what could be generated *by* overstriking. I think

Re: IBM 1620 invalid character symbol

2017-09-26 Thread Philippe Verdy via Unicode
This is what is printed in the manual by its editor that probably used metalic fonts, however I doubt the actual typewriter had this symbol on the wheel of hammers, and it was probably just overtriking the two letters X and I. 2017-09-26 15:03 GMT+02:00 John W Kennedy via Unicode

Re: IBM 1620 invalid character symbol

2017-09-26 Thread Philippe Verdy via Unicode
But what is interesting is the use of negative digits (-1 to -9, with the minus sign above the digit; I've not seen a case of minus 0, not needed apparently by the described operations) How do you encode these negative decimal digits in Unicode ? with a macron diacritic ? 2017-09-26 15:20

Re: IBM 1620 invalid character symbol

2017-09-27 Thread Philippe Verdy via Unicode
But it is not the case for this early computer, whose typewriter terminal is clearly using non-interchangeable font balls but old metalic type on a "wheel of hammers". It's clearly also that this is not that typerwriter (described in the munalk) that was used to typeset the manual using more

Re: Unicode education in Schools

2017-08-24 Thread Philippe Verdy via Unicode
2017-08-24 19:17 GMT+02:00 Andre Schappo via Unicode : > > Because there are many systems that can now handle BMP characters but not > cannot handle SMP characters. > > One example being systems that use mysql utf8 (3 byte encoding) and have > not yet updated to utf8mb4 (4

Re: Version linking?

2017-08-24 Thread Philippe Verdy via Unicode
2017-08-17 22:37 GMT+02:00 Richard Wordingham via Unicode < unicode@unicode.org>: > Thus, at the level of undisputable text, in Indic scripts there appears > to be no provision for the ordering of multiple left matras that are > to be stored in logical order (i.e. backing order) after the onset >

Re: Character Sequences of Uncertain Rendering (was: Version linking?)

2017-08-26 Thread Philippe Verdy via Unicode
2017-08-26 21:28 GMT+02:00 Richard Wordingham via Unicode < unicode@unicode.org>: > > I'm wondering if there are any cases where a SHY _should_ go between a > Latin letter and diacritic. I can't think of any. > In standard Latin orthography you would not expect it, normally, but there will be

Re: Character Sequences of Uncertain Rendering (was: Version linking?)

2017-08-27 Thread Philippe Verdy via Unicode
2017-08-27 6:06 GMT+02:00 Richard Wordingham via Unicode < unicode@unicode.org>: > On Sat, 26 Aug 2017 21:52:19 +0200 > Philippe Verdy via Unicode <unicode@unicode.org> wrote: > > > 2017-08-26 21:28 GMT+02:00 Richard Wordingham via Unicode < > > unico

Re: Character Sequences of Uncertain Rendering (was: Version linking?)

2017-08-27 Thread Philippe Verdy via Unicode
ve encoding of many emojos (now with very long sequences for groups of people which also include their own complex placement rules) 2017-08-28 4:40 GMT+02:00 Richard Wordingham via Unicode < unicode@unicode.org>: > On Sun, 27 Aug 2017 19:55:31 +0200 > Philippe Verdy via Unicode <unicode

Re: Ah the power of emoji! To encompass even science and mythology!

2017-08-23 Thread Philippe Verdy via Unicode
other interesting combinations: - = parasol - = parapluie - = sun glasses - = parafoudre Note that a "combining" shadow is not absolutely necessary, but I don't how a shadow can exist with the object creating it and giving its form to the shadow.

Re: Ah the power of emoji! To encompass even science and mythology!

2017-08-23 Thread Philippe Verdy via Unicode
Why this distinction with the left oright side on which you'll place the "half moon" (which "half moon" when eclipses actually occur either on full moons or new moons?) and the Sun ??? Note that Solar eclipses occur normally during the day at places where they are observable, but not necessarily

Re: Assamese and Unicode.

2017-08-23 Thread Philippe Verdy via Unicode
It could appear as a supplementary chart for the ISCII standard, but when converting to Unicode, it should have no impact except possibly encoding some of their letters in the new chart as pairs of Unicode characters even if one of them would not be necessary in all contexts (it could be a variant

Re: Unicode education in Schools

2017-08-24 Thread Philippe Verdy via Unicode
Strings in Java and JavaScript are basically the same as they are arbitrary sequences of 16-bit code units, and not restricted to text with valid UTF-16 encoding. The differences are in the set of access methods, but they are both normally immutable, and both allow (but do enforce) substrings to

Re: HTTPS

2017-10-04 Thread Philippe Verdy via Unicode
continuousbuilds may just check the statue of the short shasums files to know when one has changed, this would not use lot of bandwidth. Anyway if your website supports HTTP mime requests for conditional downloads , or if clients are using HEAD ratrher than GET requests to get metadata, this saves

Re: Plane-2-only string

2017-11-13 Thread Philippe Verdy via Unicode
Any font would likely map the space (and probably for any CJK font the ideographic space). As well the newline don't need any font, it is synthetized by renderers. This could be used to compose some Japanese-like Aiku with some meaning... 2017-11-13 23:54 GMT+01:00 James Kass via Unicode

Re: Plane-2-only string

2017-11-13 Thread Philippe Verdy via Unicode
May be this test page ? http://www.i18nguy.com/unicode/supplementary-test.html 2017-11-13 20:38 GMT+01:00 James Kass via Unicode : > A font's sample text can be used in place of the default "The quick > brown fox..." text which is used to illustrate the typeface in >

Re: Plane-2-only string

2017-11-13 Thread Philippe Verdy via Unicode
2017-11-13 21:48 GMT+01:00 James Kass : > Peter Constable wrote, > > >> May be this test page ? > >> > >> http://www.i18nguy.com/unicode/supplementary-test.html > > > > Thanks. I’d need to know _at least something_ about what the characters > > signify, though, to have a

Re: Aw: Re: LATIN CAPITAL LETTER SHARP S officially recognized

2017-11-09 Thread Philippe Verdy via Unicode
So this is effectively (custom HTML-like markup) "Bäck-ker" 2017-11-10 4:11 GMT+01:00 Asmus Freytag via Unicode : > On 11/9/2017 6:40 PM, Elias Mårtenson via Unicode wrote: > > On 9 November 2017 at 18:12, Walter Tross wrote: > >> Long story short:

Re: Aw: Re: LATIN CAPITAL LETTER SHARP S officially recognized

2017-11-09 Thread Philippe Verdy via Unicode
2017-11-10 3:40 GMT+01:00 Elias Mårtenson via Unicode : > On 9 November 2017 at 18:12, Walter Tross wrote: > >> Long story short: it's Abschlusssatz now (and Rollladen, etc.) One of the >> criteria of the reform was to normalise hyphenation. This has

Re: Armenian Mijaket (Armenian colon)

2017-12-05 Thread Philippe Verdy via Unicode
4:05+0100, Philippe Verdy via Unicode wrote: > > The Armenian script has its own distinctive punctuation (vertsaket) for > the > > standard full stop at end of sentence (whose glyph looks very much like > the > > Basic Latin/ASCII colon, however slighly more bold and

Re: Armenian Mijaket (Armenian colon)

2017-12-05 Thread Philippe Verdy via Unicode
reviation dots). The new encoded mikajet may include a note suggesting the use of the MIDDLE DOT as a preferable fallback. 2017-12-05 21:35 GMT+01:00 Asmus Freytag via Unicode <unicode@unicode.org>: > On 12/5/2017 11:28 AM, Philippe Verdy via Unicode wrote: > > U+2024 is not suppo

Re: Armenian Mijaket (Armenian colon)

2017-12-05 Thread Philippe Verdy via Unicode
n > the Armenian block) as it also has to be distinguisdhed from leader dots in > Armenian TOC, exactly like the vertsaket was distinguished at U+0589. > > > 2017-12-05 19:59 GMT+01:00 S. Gilles <sgil...@math.umd.edu>: > >> On 2017-12-05T18:44:05+0100, Philippe Verdy via Uni

Armenian Mijaket (Armenian colon)

2017-12-05 Thread Philippe Verdy via Unicode
The Armenian script has its own distinctive punctuation (vertsaket) for the standard full stop at end of sentence (whose glyph looks very much like the Basic Latin/ASCII colon, however slighly more bold and slanted and whose dots are rectangular). It is encoded at U+0589. And used in traditional

Re: Aquaφοβία

2017-12-09 Thread Philippe Verdy via Unicode
2017-12-09 15:28 GMT+01:00 Richard Wordingham via Unicode < unicode@unicode.org>: > Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0 > implies that it might be considered desirable to have a word boundary > in 'aquaφοβία' or a grapheme cluster break in a coding such as <006C, >

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-06 Thread Philippe Verdy via Unicode
It could be argued that "modern" languages could use unique identifiers for their syntax or API independantly of the name being rendered. The problem is that translated names may collide in non-obvious way and become ambiguous. We've already seen the problems it caused in Excel with its translated

Re: The Unicode Standard and ISO

2018-06-09 Thread Philippe Verdy via Unicode
I just see the WG2 as a subcomity where governements may just check their practices and make minimum recommendations. Most governements are in fact very late to adopt the industry standards that evolve fast, and they just want to reduce the frequency of necessary changes jsut to enterinate what

Re: The Unicode Standard and ISO

2018-06-09 Thread Philippe Verdy via Unicode
2018-06-09 17:22 GMT+02:00 Marcel Schneider via Unicode : > On Sat, 9 Jun 2018 09:47:01 +0100, Richard Wordingham via Unicode wrote: > > > > On Sat, 9 Jun 2018 08:23:33 +0200 (CEST) > > Marcel Schneider via Unicode wrote: > > > > > > Where there is opportunity for productive sync and merging

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Philippe Verdy via Unicode
If you intend to allow all the standard orthography of common languages, you would also need to support apostrophes and regular hyphens in identifiers, including those from ASCII ! The Catalan middle dot is just a compact variant of the hyphen, it should have better been a diacritic, but the

Re: The Unicode Standard and ISO

2018-06-08 Thread Philippe Verdy via Unicode
2018-06-08 19:41 GMT+02:00 Richard Wordingham via Unicode < unicode@unicode.org>: > On Fri, 8 Jun 2018 13:40:21 +0200 > Mark Davis ☕️ wrote: > > > Mark > > > > On Fri, Jun 8, 2018 at 10:06 AM, Richard Wordingham via Unicode < > > unicode@unicode.org> wrote: > > > > > On Fri, 8 Jun 2018 05:32:51

Re: The Unicode Standard and ISO

2018-06-07 Thread Philippe Verdy via Unicode
2018-06-07 21:13 GMT+02:00 Marcel Schneider via Unicode : > On Thu, 17 May 2018 22:26:15 +, Peter Constable via Unicode wrote: > […] > > Hence, from an ISO perspective, ISO 10646 is the only standard for which > on-going > > synchronization with Unicode is needed or relevant. > > This point

Re: Italic mu in squared Latin abbreviations?

2018-06-19 Thread Philippe Verdy via Unicode
CJK-specific letter forms for these abbreviations/units should be left as is. They are kept for compatibility reason and I don't see a reason to change them to upright which would contradict their legacy usage. The SI brochure does not apply to these legacy square presentations (which would be

Re: preliminary proposal: New Unicode characters for Arabic music half-flat and half-sharp symbols

2018-05-26 Thread Philippe Verdy via Unicode
Even flat notes or rythmic and pause symbols in Western musical notations have different contextual meaning depending on musical keys at start of scores, and other notations or symbols added above the score. So their interpretation are also variable according to context, just like tuning in a

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Philippe Verdy via Unicode
In my opinion the usual constant is most often shown as "휋" (curly serifs, slightly slanted) in mathematical articles and books (and in TeX), but rarely as "π" (sans-serif). There's a tradition of using handwriting for this symbol on backboards (not always with serifs, but still often slanted).

Re: Linearized tilde?

2017-12-29 Thread Philippe Verdy via Unicode
Isn't it a rounded variant of Latin letter n ? Then it could exist also in uppercase form (like "n" and "N") It could also be used as a spacing version of the combining tilde diacritic, to be written after the letter instead of being combined above it (so "el Niño" would we written with it as "el

Re: Popular wordprocessors treating U+00A0 as fixed-width

2017-12-31 Thread Philippe Verdy via Unicode
Well it's unfortunate that Microsoft's own response (by its MSVP) is completely wrong, suggesting to use Narrow non-breaking space to get justification, which is exactly the reverse where these NNBSP should NOT be justified and keep their width. Microsoft's developers have absolutely

Re: Inconsistency between UTS 39 and 24

2017-12-21 Thread Philippe Verdy via Unicode
These are ISO 15924 script codes for script variants or groups of related scripts, not used in Unicode classification of characters due to their unification (even if there are registered variants for them) 2017-12-22 1:18 GMT+01:00 Karl Williamson via Unicode : > In

Re: Printed versions of Unicode v1 through v4 available

2018-01-07 Thread Philippe Verdy via Unicode
If you don't know what to do with your books (any kind), go to your local public library to give it there, or give it to a school, they may interest students. Such books are rarely found in primary schools but this may insterest them to get some supports and the earlier versions are simpler to

Re: Emoji’s

2018-01-11 Thread Philippe Verdy via Unicode
2018-01-11 6:35 GMT+01:00 Pierpaolo Bernardi via Unicode < unicode@unicode.org>: > On Thu, Jan 11, 2018 at 4:44 AM, jillian mestel via Unicode > wrote: > > To whom it may concern, > > I was very disappointed to learn that there are no emojis of portraying > a dominant left

Re: Emoji for major planets at least?

2018-01-18 Thread Philippe Verdy via Unicode
Well I can think of a popular pseudo-planet, the "Death Star" or "Black Star" (for the "Star Wars" series), which is easily recognized by its color and shape (with the deep built crater, and optionally its destroyed half part) which also looks like a real planet, the Saturnian moon Mimas with its

Re: 0027, 02BC, 2019, or a new character?

2018-01-19 Thread Philippe Verdy via Unicode
Hmmm that character exists already at 0+0315 (a combining comma above right). It would work for the new Kazah orthographic system, including for collation purpose. I don't think IDN rejects this combining version. 2018-01-19 14:37 GMT+01:00 Philippe Verdy : > May be the

Re: 0027, 02BC, 2019, or a new character?

2018-01-19 Thread Philippe Verdy via Unicode
Also U+0315 is not part of any decomposition for canonical normalization purpose, so it would remain encoded separately (only subject to possible reordering if there are other diacritics) 2018-01-19 14:37 GMT+01:00 Philippe Verdy : > May be the IDN could accept a new

Re: 0027, 02BC, 2019, or a new character?

2018-01-19 Thread Philippe Verdy via Unicode
2018-01-19 14:47 GMT+01:00 Michael Everson via Unicode : > There’s no redeeming this orthography. This is not a redeeming, the Kazakh government currently has not made any assesment of how to encode their proposed system. Who said that was was proposed by them was an

Re: 0027, 02BC, 2019, or a new character?

2018-01-19 Thread Philippe Verdy via Unicode
May be the IDN could accept a new combining diacritic (sort of right-side acute accent). After all the Kazakh intent is not to define a new separate character but a modification of base letter to create a single letter in their alphabet. So a proposal for COMBINING APOSTROPHE (whose spacing

Re: 0027, 02BC, 2019, or a new character?

2018-01-19 Thread Philippe Verdy via Unicode
For the root zone may be, but not formally rejected by IDN, and the Kazakh zone could accept it without problem. It also has the advantage of allowing cleaner collation and contextual text extraction, and it also allows better placement of the combining character with its base in some dedicated

Re: 0027, 02BC, 2019, or a new character?

2018-01-21 Thread Philippe Verdy via Unicode
punctuation sign for quotation...) 2018-01-20 21:04 GMT+01:00 Simon Montagu via Unicode <unicode@unicode.org>: > On 19/01/18 15:37, Philippe Verdy via Unicode wrote: > > May be the IDN could accept a new combining diacritic (sort of > > right-side acute accent). After

Re: 0027, 02BC, 2019, or a new character?

2018-01-24 Thread Philippe Verdy via Unicode
So there will be a new administrative jargon in Kazakhstan that people won't like, and outside the government, they'll continue using their exiosting keyboards, and will only trnasliterate to Latin using a simple 1-t-to-1 mapping without the ugly apostrophes (most probably acute accents on vowels,

Re: 0027, 02BC, 2019, or a new character?

2018-01-24 Thread Philippe Verdy via Unicode
Great but then why sticking on a pure western subset (ASCII is mostly for US only). If he wants to be eastern, so choose ISO 8859-2. As a bonus, banning the apostrophe from the alphabet will have be security improvement (thing about the many cases where ASCII apostrophes are used as string

Re: 0027, 02BC, 2019, or a new character?

2018-01-25 Thread Philippe Verdy via Unicode
Such example shows that ignoring umlauts makes the document counterintuitive. Nobody is able to infer that "Paper" refers to a person here or if he actually meant a paper sheet/article... At least he should have written "Paeper" which would be more correct (if "Christoph Päper" is German, the

Re: 0027, 02BC, 2019, or a new character?

2018-01-25 Thread Philippe Verdy via Unicode
Just a remark for fun: - You'll also note that this talk is all about the apostrophe, and if Kazakhstan wants to introduce it in 2019, that year will match exactly the code point U+2019 [ ’ ]... - This year 2018 is also the year to discuss and reverse the apostrophe decision, and it matches the

Re: 0027, 02BC, 2019, or a new character?

2018-01-24 Thread Philippe Verdy via Unicode
I agree, and still you won't necessarily have to press a dead key to have these characters, if you map one key where the Cyrillic letter was producing directly the character with its accent. No surprise for user, fast to type, easy to learn, typographically correct, preserves the etymologies and

Re: Internationalised Computer Science Exercises

2018-01-28 Thread Philippe Verdy via Unicode
2018-01-28 23:44 GMT+01:00 Richard Wordingham via Unicode < unicode@unicode.org>: > On Sun, 28 Jan 2018 20:29:28 +0100 > Philippe Verdy via Unicode <unicode@unicode.org> wrote: > > > 2018-01-28 5:12 GMT+01:00 Richard Wordingham via Unicode < > > unicode@unicode

Re: Internationalised Computer Science Exercises

2018-01-28 Thread Philippe Verdy via Unicode
ass, and matches only "ab", "ba", "ac", or "ca", it is equivalent to "{{2,2}a|b|c}" or "{{2}a|b|c}". With that extension you can build the necessary regexps to match canonical equivalent strings with a finite regexp. 2018-01-29 7:16 GM

Re: Internationalised Computer Science Exercises

2018-01-28 Thread Philippe Verdy via Unicode
2018-01-28 5:12 GMT+01:00 Richard Wordingham via Unicode < unicode@unicode.org>: > On Sat, 27 Jan 2018 14:13:40 -0800The theory > of regular expressions (though you may not think that mathematical > regular expressions matter) extends to trace monoids, with the > disturbing exception that the

Re: Internationalised Computer Science Exercises

2018-01-28 Thread Philippe Verdy via Unicode
Typo, the full regexp has undesired asterisks: [[ [^[[:cc=0:]]] - [[:cc=above:][:cc=below:]] ]] * ( [[ [^[[:cc=0:]]] - [[:cc=above:][:cc=below:]] ]] * | [[ [^[[:cc=0:]]] - [[:cc=above:][:cc=below:]] ]] * < COMBINING CIRCUMFLEX> 2018-01-28 20:29 GMT+01:00 Philippe Verdy

Re: Internationalised Computer Science Exercises

2018-01-28 Thread Philippe Verdy via Unicode
Note that for finding occurence of simpler combining sequences such as finding the regexp is simpler: [[ [^[[:cc=0:]]] - [[:cc=above:]] ]] * The central character class allows 53 distinct combining classes, and the maximum match length is 2+53=55 characters. If Unicode assigns new combining

Re: Internationalised Computer Science Exercises

2018-01-28 Thread Philippe Verdy via Unicode
bc]" and matches only "a", "b", or "c" > And "{{0}[abc]}" is quantified to match zero and only zero item (the > items are not relevant) and will never match anything, just like > "{{0}a|b|c}" or "{{0}}". > And "{{2}[ab

Re: Internationalised Computer Science Exercises

2018-01-29 Thread Philippe Verdy via Unicode
1-29 9:57 GMT+01:00 Richard Wordingham via Unicode < unicode@unicode.org>: > On Mon, 29 Jan 2018 07:16:04 +0100 > Philippe Verdy via Unicode <unicode@unicode.org> wrote: > > > 2018-01-28 23:44 GMT+01:00 Richard Wordingham via Unicode < > > unicode@unicode.org>: &

Re: Keyboard layouts and CLDR (was: Re: 0027, 02BC, 2019, or a new character?)

2018-01-31 Thread Philippe Verdy via Unicode
> > Note the French "touch" keyboard layout is complete for French (provided > you select the one of the 3 new layouts with Emoji: it has the extra "key" > for selecting the input language in all 4 layouts) > > But the "full" (dockable) touch layout in French which emulates a physical > keyboard

  1   2   3   >