Re: New Unicode Working Group: Message Formatting

2020-01-14 Thread Philippe Verdy via Unicode
People name are NOT transliterated freely. It's up to each person to document his romanized name, it should not be invented by automatic processes. And frequently the romanized name (officialized) does noit match the original name in another script: this is very frequent for Chinese people, as

Re: Geological symbols

2020-01-13 Thread Philippe Verdy via Unicode
It is possible with some other markup languages, including HTML by using ruby notation and other interlinear notations for creating special vertical layouts inside an horizontal line. There are difficulties however caused by line wraps which may occur before the vertical layout, or even inside it

Re: New Unicode Working Group: Message Formatting

2020-01-11 Thread Philippe Verdy via Unicode
You seem to have never seen how translation packages work and are used in common projects (not just CLDR, but you could find them as well in Wikimedia projects, or translation packages for lot of open source packages). The purpose is to allow translating the UI of these applications for user's

Re: emojis for mouse buttons?

2020-01-01 Thread Philippe Verdy via Unicode
pointing up inside) > > MOUSE SCROLL DOWN (mouse with middle button black and white triangle > > pointing down inside) > > > > These characters are pretty useful in software manuals, training > > materials and user interfaces. > > > > Happy New Year, > >

Re: emojis for mouse buttons?

2019-12-31 Thread Philippe Verdy via Unicode
Playing with the fiolling of the middle cell to mean a double click is a bad idea, it would be better to add one or two rounded borders separated from the button, or simply display two icons in sequence for a double click). Note that the glyphs do not necessarily have to show a mouse, it could as

Re: emojis for mouse buttons?

2019-12-31 Thread Philippe Verdy via Unicode
t in three cells by horizontal and vertical strokes, and one of the three cells filled (representing the wire or the wireless waves is not necessary). Le mar. 31 déc. 2019 à 14:57, Shriramana Sharma a écrit : > Why are these called "emojis" for mouse buttons rather than just > "

emojis for mouse buttons?

2019-12-31 Thread Philippe Verdy via Unicode
A lot of application need to document their keymap and want to display keys. For now there are emojis for mouses (several variants: 1, 2 or 3 buttons), independently of the button actually pressed. However there's no simple emoji to represent the very common mouse click buttons used in lot of

Re: Encoding the Nsibidi script (African) for writing the Igbo language

2019-11-11 Thread Philippe Verdy via Unicode
Le lun. 11 nov. 2019 à 17:31, Markus Scherer a écrit : > We generally assign the script code when the script is in the pipeline for > a near-future version of Unicode, which demonstrates that it's "a candidate > for encoding". We also want the name of the script to be settled, so that > the

Re: Encoding the Nsibidi script (African) for writing the Igbo language

2019-11-11 Thread Philippe Verdy via Unicode
ion across countries or caused by wars, invasions, diplomacy, or commercial interests) Le lun. 11 nov. 2019 à 17:31, Markus Scherer a écrit : > On Mon, Nov 11, 2019 at 4:03 AM Philippe Verdy via Unicode < > unicode@unicode.org> wrote: > >> But first there's still no code in ISO 159

Encoding the Nsibidi script (African) for writing the Igbo language

2019-11-11 Thread Philippe Verdy via Unicode
Encoding the Nsibidi script (African) for writing the Efik, Ekoi, Ibibio, Igbo language. See this site as an example of use, with links to published educational books. http://blog.nsibiri.org/ Also this online dictionary: https://fr.scribd.com/doc/281219778/Ikpokwu Other links:

Re: comma ellipses

2019-10-07 Thread Philippe Verdy via Unicode
Commas may be used instead of dots by users of French keyboards (it's easier to type the comma, when the dot/full stop requires pressing the SHIFT key). I may be wrong, but I've quire frequently seen commas or semicolons instead of dot/full stops under normal orthography. But the web and notably

Re: Acute/apostrophe diacritic in Võro for palatalized consonants

2019-08-19 Thread Philippe Verdy via Unicode
I must add that the current version of Wikipedia in Võro, seems to have completely renounced to encode this combining mark (no acute, no apostrophe), probably because of lack of proper encoding in Unicode and difficulty to harmonize its orthography. It may be a good argument for the addition of

Re: Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Philippe Verdy via Unicode
t;> >> Anshuman Pandey did preliminary research on this in 2011. >> >> http://www.unicode.org/L2/L2011/11144-magar-akkha.pdf >> >> It would be premature to assign an ISO 15924 script code, pending the >> research to determine whether this script should be separately encoded

Re: Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Philippe Verdy via Unicode
Also we can note that "mgp" (Eastern Magari) is severely endangered according to multiple sources include Ethnologue and the Linguist List. This is still not the case for Western Magari (mostly on Nepal, not in Sikkim India), where evidence is probably easier to find (where the encoding of a new

Re: Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Philippe Verdy via Unicode
Le lun. 22 juil. 2019 à 18:43, Ken Whistler a écrit : > See the entry for "Magar Akkha" on: > > http://linguistics.berkeley.edu/sei/scripts-not-encoded.html > > Anshuman Pandey did preliminary research on this in 2011. > That's what I said: 8 years ago already. >

Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Philippe Verdy via Unicode
According to Ethnolog, the Eastern Magar language (mgp) is written in two scripts: Devanagari and "Akkha". But the "Akkha" script does not seem to have any ISO 15924 code. The Ethnologue currently assigns a private use code (Qabl) for this script. Was the addition delayed due to lack of

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-20 Thread Philippe Verdy via Unicode
r a écrit : > > On 7/17/2019 4:54 PM, Philippe Verdy via Unicode wrote: > > then the Unicode version (age) used for Hieroglyphs should also be > assigned to Hieratic. > > It is already. > > > In fact the ligatures system for the "cursive" Egyptian Hieratic is so >

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-17 Thread Philippe Verdy via Unicode
But my concern is in fact valid as well for Egyptian Hieratic (considered in Chapter 14 to be "unified" with the Hieroglyphs, and being a cursive variant, currently not supported in any font because of the very complex set of ligatures this would require, and that may not even work properly with

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-17 Thread Philippe Verdy via Unicode
Sorry I misread (with an automated tool) an old dataset where these "3.0" versions were indicated in an incorrect form Le jeu. 18 juil. 2019 à 01:07, Philippe Verdy a écrit : > Note also that there are variants registered with Unicode versions (Age) > for symbols, even if they don't have any

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-17 Thread Philippe Verdy via Unicode
Note also that there are variants registered with Unicode versions (Age) for symbols, even if they don't have any assigned Unicode alias, but this is not a problem. 994 Zinh Code for inherited script codet pour écriture héritée Inherited 2009-02-23 995 *Zmth

ISO 15924 : missing indication of support for Syriac variants

2019-07-17 Thread Philippe Verdy via Unicode
The ISO 15924/RA reference page contains indication of support in Unicode for variants of various scripts such as Aran, Latf, Latg, Hanb, Hans, Hant:. 160 *Arab* Arabic arabe Arabic 1.1 2004-05-01 161 *Aran* Arabic (Nastaliq variant) arabe (variante nastalique) 1.1 2014-11-15 ... 503 *Hanb* Han

Fwd: Numeric group separators and Bidi

2019-07-09 Thread Philippe Verdy via Unicode
> Well my first feeling was that U+202F should work all the time, but I > found cases where this is not always the case. So this must be bugs in > those renderers. > I think we can attribute these bugs to the fact that this character is insufficiently known, and not even accessible in most input

Re: Numeric group separators and Bidi

2019-07-09 Thread Philippe Verdy via Unicode
e if you need > the look of a narrow space. > > Another possibility is to embed the number in a LRI...PDI block, as > e.g. https://unicode.org/cldr/utility/bidic.jsp does with the "1–3%" > fragment of its default example. > > cheers, > egmont > > On Tue, Jul 9, 2

Numeric group separators and Bidi

2019-07-09 Thread Philippe Verdy via Unicode
Is there a narrow space usable as a numeric group separator, and that also has the same bidi property as digits (i.e. neutral outside the span of digits and separators, but inheriting the implied directionality of the previous digit) ? I can't find a way to use narrow spaces instead of

Re: Unicode "no-op" Character?

2019-07-03 Thread Philippe Verdy via Unicode
Also consider that C0 controls (like STX and ETX) can already be used for packetizing, but immediately comes the need for escaping (DLE has been used for that goal, jsut before the character to preserve in the stream content, notably before DLE itself, or STX and ETX). There's then no need at all

Re: Unicode "no-op" Character?

2019-07-03 Thread Philippe Verdy via Unicode
Le mer. 3 juil. 2019 à 06:09, Sławomir Osipiuk a écrit : > I don’t think you understood me at all. I can packetize a string with any > character that is guaranteed not to appear in the text. > Your goal is **impossible** to reach with Unicode. Assume sich character is "added" to the UCS, then

Re: Unicode "no-op" Character?

2019-06-29 Thread Philippe Verdy via Unicode
If you want to "packetize" arbitrarily long Unicode text, you don't need any new magic character. Just prepend your packet with a base character used as a syntaxic delimiter, that does not combine with what follows in any normalization. There's a fine character for that: the TAB control. Except

Symbols of colors used in Portugal for transport

2019-04-27 Thread Philippe Verdy via Unicode
A very useful think to add to Unicode (for colorblind people) ! http://bestinportugal.com/color-add-project-brings-color-identification-to-the-color-blind Is it proposed to add as new symbols ?

Re: Emoji Haggadah

2019-04-19 Thread Philippe Verdy via Unicode
I cannot; definitely it requires first good knowldge of English (to find possible synonyms, plus phonetic approximations, including using abbreviatable words), and Hebrew culture (to guess names and the context). All this text looks completely random and makes no sense otherwise. Le mar. 16 avr.

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-17 Thread Philippe Verdy via Unicode
Le ven. 8 févr. 2019 à 13:56, Egmont Koblinger a écrit : > Philippe, I hate do say it, but at the risk of being impolite, I just > have to. > Resist this idea, I've not been impolite. I just want to show you that terminals are legacy environments that are far behind what is needed for proper

Re: Bidi paragraph direction in terminal emulators

2019-02-14 Thread Philippe Verdy via Unicode
Le mar. 12 févr. 2019 à 14:16, Egmont Koblinger via Unicode < unicode@unicode.org> a écrit : > > There is nothing magic about the grid of cells, and once you introduce > new escape sequences, you might as well truly modernise the terminal. > > The magic about the grid of cells is all the software

Re: Encoding colour (from Re: Encoding italic)

2019-02-11 Thread Philippe Verdy via Unicode
Le dim. 10 févr. 2019 à 02:33, wjgo_10...@btinternet.com via Unicode < unicode@unicode.org> a écrit : > Previously I wrote: > > > A stateful method, though which might be useful for plain text streams > > in some applications, would be to encode as characters some of the > > glyphs for indicating

Re: Encoding italic

2019-02-11 Thread Philippe Verdy via Unicode
Le dim. 10 févr. 2019 à 16:42, James Kass via Unicode a écrit : > > Philippe Verdy wrote, > > >> ...[one font file having both italic and roman]... > > The only case where it happens in real fonts is for the mapping of > > Mathematical Symbols which have a distinct encoding for some > >

Re: Bidi paragraph direction in terminal emulators

2019-02-10 Thread Philippe Verdy via Unicode
Le sam. 9 févr. 2019 à 20:55, Egmont Koblinger via Unicode < unicode@unicode.org> a écrit : > Hi Asmus, > > > On quick reading this appears to be a strong argument why such emulators > will > > never be able to be used for certain scripts. Effectively, the model > described works > > well with

Re: Encoding italic

2019-02-10 Thread Philippe Verdy via Unicode
Le dim. 10 févr. 2019 à 05:34, James Kass via Unicode a écrit : > > Martin J. Dürst wrote, > > >> Isn't that already the case if one uses variation sequences to choose > >> between Chinese and Japanese glyphs? > > > > Well, not necessarily. There's nothing prohibiting a font that includes >

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Philippe Verdy via Unicode
Adding a single bit of protection in cell attributes to indicate they are either protected or become transparent (and the rest of the attributes/character field indicates the id of another terminal grid or rendering plugin crfeating its own layer and having its own scrolling state and dimensions)

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Philippe Verdy via Unicode
Le jeu. 7 févr. 2019 à 19:38, Egmont Koblinger a écrit : > As you can see from previous discussions, there's a whole lot of > confusion about the terminology. And it was exactly the subject of my first message sent to this thread ! you probably missed it. > Philippe, with all due respect, I

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Philippe Verdy via Unicode
Le jeu. 7 févr. 2019 à 13:29, Egmont Koblinger a écrit : > Hi Philippe, > > > There's some rules for correct display including with Bidi: > > In what sense are these "rules"? Where are these written, in what kind > of specification or existing practice? > "Rules" are not formally written, they

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-06 Thread Philippe Verdy via Unicode
.) > > Anyway, please also see my previous email; I hope that clarifies a lot > for you, too. > > > cheers, > egmont > > On Tue, Feb 5, 2019 at 5:53 PM Philippe Verdy via Unicode > wrote: > > > > I think that before making any decision we must make some decis

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-05 Thread Philippe Verdy via Unicode
I think that before making any decision we must make some decision about what we mean by "newlines". There are in fact 3 different functions: - (1) soft line breaks (which are used to enforce a maximum display width between paragraph margins): these are equivalent to breakable and compressible

Re: Proposal for BiDi in terminal emulators

2019-02-02 Thread Philippe Verdy via Unicode
Actually not all U+E0020 through U+E007E are "un-deprecated" for this use. For now emoji flags only use: - U+E0041 through U+E005A (mapping to ASCII letters A through Z used in 2-letter ISO3166-1 codes). These are usable in pairs, without requiring any modifier (and only for ISO3166-1 registered

Re: Encoding italic

2019-02-01 Thread Philippe Verdy via Unicode
the proposal would contradict the goals of variation selectors and would pollute ther variation sequences registry (possibly even creating conflicts). And if we admit it for italics, than another VSn will be dedicated to bold, and another for monospace, and finally many would follow for various

Re: Encoding italic

2019-01-28 Thread Philippe Verdy via Unicode
So you used "bold I.e, you converted from ASCII to tag characters the full HTML sequences "" and "", including the HTML element name. I see little interest for that approach. Additionally this means that U+E003C is the tag identifier and its scope does not end for the rest of the text (the HTML

Re: Encoding italic

2019-01-27 Thread Philippe Verdy via Unicode
You're not very explicit about the Tag encoding you use for these styles. Of course it must not be a language tag so the introducer is not U+E0001, or a cancel-all tag so it is not prefixed by U+E007F It cannot also use letter-like, digit-like and hyphen-like tag characters for its introduction.

Re: Ancient Greek apostrophe marking elision

2019-01-27 Thread Philippe Verdy via Unicode
For Volapük, it looks much more like U+02BE (right half ring modifier letter) than like U+02BC (apostrophe "modifier" letter). according to the PDF on https://archive.org/details/cu31924027111453/page/n12 The half ring makes a clear distinction with the regular apostrophe (for elisions) or

Re: Encoding italic (was: A last missing link)

2019-01-17 Thread Philippe Verdy via Unicode
If encoding italics means reencoding the normal linguistic usage, it is no ! We already have the nightmares caused by partial encoding of Latin and Greek (als a few Hebrew characters) for maths notations or IPA notations, but they are restricted to a well delimited scope of use and subset, and at

Re: NNBSP (was: A last missing link for interoperable representation)

2019-01-17 Thread Philippe Verdy via Unicode
Le jeu. 17 janv. 2019 à 05:01, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > On 16/01/2019 21:53, Richard Wordingham via Unicode wrote: > > > > On Tue, 15 Jan 2019 13:25:06 +0100 > > Philippe Verdy via Unicode wrote: > > > >> If yo

Re: A last missing link for interoperable representation

2019-01-15 Thread Philippe Verdy via Unicode
Note that even if this NNBSP character is not mapped in a font, it should be rendered correctly with all modern renderers (the mapping is necessary only when a font design wants to tune its metrics, because its width varies between 1/8 and 1/6 em (the narrow space is a bit narrower in traditional

Re: A last missing link for interoperable representation

2019-01-15 Thread Philippe Verdy via Unicode
Le lun. 14 janv. 2019 à 20:25, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > On 14/01/2019 06:08, James Kass via Unicode wrote: > > > > Marcel Schneider wrote, > > > >> There is a crazy typeface out there, misleadingly called 'Courier > >> New', as if the foundry didn’t

Re: UCA unnecessary collation weight 0000

2018-11-04 Thread Philippe Verdy via Unicode
erties, algorithms, CLDR, ICU) that have a higher > benefit. > > You can continue flogging this horse all you want, but I'm muting this > thread (and I suspect I'm not the only one). > > Mark > > > On Sun, Nov 4, 2018 at 2:37 AM Philippe Verdy via Unicode < > unicode@un

Re: Encoding

2018-11-04 Thread Philippe Verdy via Unicode
I can take another example about what I call "legacy encoding" (which really means that such encoding is just an "approximation" from which no semantic can be clearly infered, except by using a non-determinist heuristic, which can frequently make "false guesses"). Consider the case of the legacy

Re: Encoding (was: Re: A sign/abbreviation for "magister")

2018-11-04 Thread Philippe Verdy via Unicode
Note that I actually propose not just one rendering for the but two possible variants (that would be equally valid withou preference). Use it after any base cluster (including with diacritics if needed, like combining underlines). - the first one can be to render the previous cluster as

Re: UCA unnecessary collation weight 0000

2018-11-03 Thread Philippe Verdy via Unicode
Le ven. 2 nov. 2018 à 22:27, Ken Whistler a écrit : > > On 11/2/2018 10:02 AM, Philippe Verdy via Unicode wrote: > > I was replying not about the notational repreentation of the DUCET data > table (using [....] unnecessarily) but about the text of UTR#10 itself. > Wh

Re: UCA unnecessary collation weight 0000

2018-11-03 Thread Philippe Verdy via Unicode
Le ven. 2 nov. 2018 à 22:27, Ken Whistler a écrit : > > On 11/2/2018 10:02 AM, Philippe Verdy via Unicode wrote: > > I was replying not about the notational repreentation of the DUCET data > table (using [....] unnecessarily) but about the text of UTR#10 itself. > Wh

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
It should be noted that the algorithmic complexity for this NFLD normalization ("legacy") is exactly the same as for NFKD ("compatibility"). However NFLD is versioned (like also NFLC), so NFLD can take a second parameter: the maximum Unicode version which can be used to filter which decomposition

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
Le sam. 3 nov. 2018 à 23:36, Philippe Verdy a écrit : > - this new decomposition mapping file for NFLC and NFLD, where NFLC is >> defined to be NFC(NFLD), has some stability requirements and it must be >> warrantied that NFD(NFLD) = NFD >> > Oops! fix my typo: it must be warrantied that

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
> > Unlike NFKC and NFKD, the NFLC and NFLD would be an extensible superset > based on MUTABLE character properties (this can also be "decompositions > mappings" except that once a character is added to the new property file, > they won't be removed, and can have some stability as well, where the

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
I can give other interesting examples about why the Unicode "character encoding model" is the best option Just consider how the Hangul alphabet is (now) encoded: its consonnant letters are encoded "twice" (leading and trailing jamos) because they carry semantic distinctions for efficient

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
h for most users to be ware that > variations can be encoded explicitly and compliantly) > > > Le sam. 3 nov. 2018 à 20:41, Philippe Verdy a écrit : > >> >> >> Le ven. 2 nov. 2018 à 20:01, Marcel Schneider via Unicode < >> unicode@unicode.org> a écrit : >

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
dy a écrit : > > > Le ven. 2 nov. 2018 à 20:01, Marcel Schneider via Unicode < > unicode@unicode.org> a écrit : > >> On 02/11/2018 17:45, Philippe Verdy via Unicode wrote: >> [quoted mail] >> > >> > Using variation selectors is only appropria

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
Le ven. 2 nov. 2018 à 20:01, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > On 02/11/2018 17:45, Philippe Verdy via Unicode wrote: > [quoted mail] > > > > Using variation selectors is only appropriate for these existing > > (preencoded)

Re: UCA unnecessary collation weight 0000

2018-11-02 Thread Philippe Verdy via Unicode
bin/collation.html and turn on "raw >>> collation elements" and "sort keys" to see the transformed collation >>> elements (from the DUCET + CLDR) and the resulting sort keys. >>> >>> a =>[29,05,_05] => 29 , 05 , 05 . >>> a\u0

Re: A sign/abbreviation for "magister"

2018-11-02 Thread Philippe Verdy via Unicode
Le ven. 2 nov. 2018 à 16:20, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > That seems to me a regression, after the front has moved in favor of > recognizing Latin script needs preformatted superscript. The use case is > clear, as we have ª, º, and n° with degree sign, and so on

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
As well the step 2 of the algorithm speaks about a single "array" of collation elements. Actually it's best to create one separate array per level, and append weights for each level in the relevant array for that level. The steps S2.2 to S2.4 can do this, including for derived collation elements

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
So it should be clear in the UCA algorithm and in the DUCET datatable that "" is NOT a valid weight It is just a notational placeholder used as ".", only indicating in the DUCET format that there's NO weight assigned at the indicated level, because the collation element is ALWAYS ignorable

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
In summary, this step given in the algorithm is completely unneeded and can be dropped completely: *S3.2 *If L is not 1, append a *level separator* *Note:*The level separator is zero (), which is guaranteed to be lower than any weight in the resulting

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
The is there in the UCA only because the DUCET is published in a format that uses it, but here also this format is useless: you never need any [.], or [..] in the DUCET table as well. Instead the DUCET just needs to indicate what is the minimum weight assigned for every level

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
Le jeu. 1 nov. 2018 à 21:31, Philippe Verdy a écrit : > so you can use these two last functions to write the first one: > > bool isIgnorable(int level, string element) { > return getLevel(getWeightAt(element, 0)) > getMinWeight(level); > } > correction: return getWeightAt(element, 0)

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
Le jeu. 1 nov. 2018 à 21:08, Markus Scherer a écrit : > When you want fast string comparison, the zero weights are useful for >> processing -- and you don't actually assemble a sort key. >> > And no, I absolutely no case where any weight is useful during processing, it does not distinguish

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
I'm not speaking just about how collation keys will finally be stored (as uint16 or bytes, or sequences of bits with variable length); I'm just refering to the sequence of weights you generate. You absolutely NEVER need ANYWHERE in the UCA algorithm any weight, not even during processing, or

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
For example, Figure 3 in the UTR#10 contains: Figure 3. Comparison of Sort Keys StringSort Key 1 cab *0706* 06D9 06EE ** 0020 0020 *0020* ** *0002* 0002 0002 2 Cab *0706* 06D9 06EE ** 0020 0020 *0020* ** *0008* 0002

UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
I just remarked that there's absolutely NO utility of the collation weight anywhere in the algorithm. For example in UTR #10, section 3.3.1 gives a collection element : [..0021.0002] for COMBINING GRAVE ACCENT. However it can also be simply: [.0021.0002] for a simple reason: the

Re: A sign/abbreviation for "magister"

2018-10-31 Thread Philippe Verdy via Unicode
As is "Mgr" for Monseigneur in French ("Mgr" without superscripts makes little sense, and if "Mr" is sometimes found as an abbreviation for "Monsieur", its standard abbreviation is "M.", and its plural "Messieurs" is noted "MM" without any abbreviation dot or superscript, but normally never as

Re: A sign/abbreviation for "magister"

2018-10-29 Thread Philippe Verdy via Unicode
For the case of "Mister" vs. "Magister", the (double) underlining is not just a stylistic option but conveys semantics as an explicit abbreviation mark ! We are here at the line between what is pure visual encoding (e.g. using superscript letters), and logical encoding (as done eveywhere else in

Re: A sign/abbreviation for "magister"

2018-10-28 Thread Philippe Verdy via Unicode
here was a "missing base character" to apply before a known combining mark or extender) Le dim. 28 oct. 2018 à 18:54, Philippe Verdy a écrit : > Le dim. 28 oct. 2018 à 18:28, Janusz S. Bień a > écrit : > >> On Sun, Oct 28 2018 at 15:19 +0100, Philippe Verdy via Unicode wrote:

Re: A sign/abbreviation for "magister"

2018-10-28 Thread Philippe Verdy via Unicode
Le dim. 28 oct. 2018 à 18:28, Janusz S. Bień a écrit : > On Sun, Oct 28 2018 at 15:19 +0100, Philippe Verdy via Unicode wrote: > > Given the "squiggle" below letters are actually gien distinctive > > semantics, I think it should be encoded a combining character (to

Re: A sign/abbreviation for "magister"

2018-10-28 Thread Philippe Verdy via Unicode
Given the "squiggle" below letters are actually gien distinctive semantics, I think it should be encoded a combining character (to be written not after a "superscript" but after any normal base letter, possibly with other combining characters, or CGJ if needed because of the compatibility

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
04:21, Garth Wallace via Unicode a écrit : > I learned that one as a kid, as the "pigpen cipher". I'm not aware of any > numerological significance (which is easy enough to "find" in anything). > > On Sat, Oct 27, 2018 at 7:43 PM Philippe Verdy via Unicode < >

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
So in summary this Masonic "alphabet" uses 13 square "letters" and a single combining mark (the central dot), possibly extended with the minus and plus signs and space. It's possible that the central dot is used as a spacing mark to note a punctuation. The assignment of Latin (or Hebrew) letters

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
I must add that the Masonic 3x3 grid alphabet has been proposed as an alternative to Braille, easier to learn and memoize, easier and faster to draw with a pen on paper without any physical guide, and easier also to recognize using only tactile contact by a finger tip, but more difficult to form

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
More interesting: the Masonic alphabet http://tallermasonico.com/0diccio1.htm - 18 letters of the Latin alphabet (or Hebrew), from A to T (excluding J and K), are disposed by group of 2 letters in a 3x3 square grid, whose global outer sides are not marked on the outer border of the grid but on

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
Do you speak about this one? https://www.magisterdaire.com/magister-symbol-black-sq/ It looks like a graphic personal signature for the author of this esoteric book, even if it looks like an interesting composition of several of our existing Unicode symbols, glued together in a vertical ligature,

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
Le sam. 27 oct. 2018 à 15:06, Asmus Freytag via Unicode a écrit : > First question is: how do you interpret the symbol? For me it is > definitely the capital M followed by the superscript "r" (written in an > old style no longer used in Poland), but there is something below the > superscript. It

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
actices > despite the standards? If so, are these just tolerated by parsers, or are > they actually generated by encoders? > > > > What would be the rationale for supporting unnecessary whitespace? If > linebreaks are forced at some line length they can presumably be removed at &

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
pace? If >>> linebreaks are forced at some line length they can presumably be removed at >>> that length and not treated as part of the encoding. >>> >>> Maybe we differ on define where the encoding begins and ends, and where >>> higher level protocols prescribe how they are embed

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
inebreaks are forced at some line length they can presumably be removed at >> that length and not treated as part of the encoding. >> >> Maybe we differ on define where the encoding begins and ends, and where >> higher level protocols prescribe how they are embedded within the proto

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
protocol. > > > > Tex > > > > > > > > > > *From:* Unicode [mailto:unicode-boun...@unicode.org] *On Behalf Of *Philippe > Verdy via Unicode > *Sent:* Sunday, October 14, 2018 1:41 AM > *To:* Adam Borowski > *Cc:* unicode Unicode Discussion >

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
t treated as part of the encoding. > > Maybe we differ on define where the encoding begins and ends, and where > higher level protocols prescribe how they are embedded within the protocol. > > > > Tex > > > > > > > > > > *From:* Unicode [mailto:unicode

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-14 Thread Philippe Verdy via Unicode
Le dim. 14 oct. 2018 à 21:21, Doug Ewell via Unicode a écrit : > Steffen Nurpmeso wrote: > > > Base64 is defined in RFC 2045 (Multipurpose Internet Mail Extensions > > (MIME) Part One: Format of Internet Message Bodies). > > Base64 is defined in RFC 4648, "The Base16, Base32, and Base64 Data >

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-14 Thread Philippe Verdy via Unicode
It's also interesting to look at https://tools.ietf.org/html/rfc3501 - which defines (for IMAP v4) another "BASE64" encoding, - and also defines a "Modified UTF-7" encoding using it, deviating from Unicode's definition of UTF-7, - and adding other requirements (which forbids alternate encodings

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-14 Thread Philippe Verdy via Unicode
oct. 2018 à 03:47, Adam Borowski via Unicode a écrit : > On Sun, Oct 14, 2018 at 01:37:35AM +0200, Philippe Verdy via Unicode wrote: > > Le sam. 13 oct. 2018 à 18:58, Steffen Nurpmeso via Unicode < > > unicode@unicode.org> a écrit : > > > The only variance is de

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-13 Thread Philippe Verdy via Unicode
Le sam. 13 oct. 2018 à 18:58, Steffen Nurpmeso via Unicode < unicode@unicode.org> a écrit : > Philippe Verdy via Unicode wrote in w9+jearw4ghyk...@mail.gmail.com>: > |You forget that Base64 (as used in MIME) does not follow these rules \ > |as it allows multiple d

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-13 Thread Philippe Verdy via Unicode
In summary, two disating implementations are allowed to return different values t and t' of Base64_Encode(d) from the same message d, but both Base64_Decode(t') and Base64_Decode(t) will be equal and will MUST return d exactly. There's an allowed choice of implementation for Base64_Encode() but

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-13 Thread Philippe Verdy via Unicode
You forget that Base64 (as used in MIME) does not follow these rules as it allows multiple different encodings for the same source binary. MIME actually splits a binary object into multiple fragments at random positions, and then encodes these fragments separately. Also MIME uses an extension of

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-12 Thread Philippe Verdy via Unicode
I also think the reverse is also true ! Decoding a Base64 entity does not warranty it will return valid text in any known encoding. So Unicode normalization of the output cannot apply. Even if it represents text, nothing indicates that the result will be encoded with some Unicode encoding form

Re: Dealing with Georgian capitalization in programming languages

2018-10-02 Thread Philippe Verdy via Unicode
I see no easy way to convert ALL UPPERCASE text with consistant casing as there's no rule, except by using dictionnary lookups. In reality data should be input using default casing (as in dictionnary entries), independantly of their position in sentences, paragraphs or titles, and the contextual

Re: Shortcuts question

2018-09-17 Thread Philippe Verdy via Unicode
Note: CLDR concentrates on keyboard layout for text input. Layouts for other functions (such as copy-pasting, gaming controls) are completely different (and not necessarily bound directly to layouts for text, as they may also have their own dedicated physical keys or users can reprogram their

Re: Shortcuts question

2018-09-16 Thread Philippe Verdy via Unicode
For games, the mnemonic meaning of keys are unlikely to be used because gamers prefer an ergonomic placement of their fingers according to the physical position for essential commands. But this won't apply to control keys, as these commands should be single keystrokes and pressing two keys instead

Re: Shortcuts question

2018-09-15 Thread Philippe Verdy via Unicode
Le ven. 7 sept. 2018 à 05:43, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > On 07/09/18 02:32 Shriramana Sharma via Unicode wrote: > > > > Hello. This may be slightly OT for this list but I'm asking it here as > it concerns computer usage with multiple scripts and i18n: > > It

Re: Unicode String Models

2018-09-11 Thread Philippe Verdy via Unicode
No 0xF8..0xFF are not used at all in UTF-8; but U+00F8..U+00FF really **do** have UTF-8 encodings (using two bytes). The only safe way to represent arbitrary bytes within strings when they are not valid UTF-8 is to use invalid UTF-8 sequences, i.e by using a "UTF-8-like" private extension of

  1   2   3   >