Re: New Unicode Working Group: Message Formatting

2020-01-14 Thread Philippe Verdy via Unicode
People name are NOT transliterated freely. It's up to each person to document his romanized name, it should not be invented by automatic processes. And frequently the romanized name (officialized) does noit match the original name in another script: this is very frequent for Chinese people, as

Re: Geological symbols

2020-01-13 Thread Philippe Verdy via Unicode
It is possible with some other markup languages, including HTML by using ruby notation and other interlinear notations for creating special vertical layouts inside an horizontal line. There are difficulties however caused by line wraps which may occur before the vertical layout, or even inside it

Re: New Unicode Working Group: Message Formatting

2020-01-11 Thread Philippe Verdy via Unicode
You seem to have never seen how translation packages work and are used in common projects (not just CLDR, but you could find them as well in Wikimedia projects, or translation packages for lot of open source packages). The purpose is to allow translating the UI of these applications for user's

Re: emojis for mouse buttons?

2020-01-01 Thread Philippe Verdy via Unicode
pointing up inside) > > MOUSE SCROLL DOWN (mouse with middle button black and white triangle > > pointing down inside) > > > > These characters are pretty useful in software manuals, training > > materials and user interfaces. > > > > Happy New Year, > >

Re: emojis for mouse buttons?

2019-12-31 Thread Philippe Verdy via Unicode
Playing with the fiolling of the middle cell to mean a double click is a bad idea, it would be better to add one or two rounded borders separated from the button, or simply display two icons in sequence for a double click). Note that the glyphs do not necessarily have to show a mouse, it could as

Re: emojis for mouse buttons?

2019-12-31 Thread Philippe Verdy via Unicode
t in three cells by horizontal and vertical strokes, and one of the three cells filled (representing the wire or the wireless waves is not necessary). Le mar. 31 déc. 2019 à 14:57, Shriramana Sharma a écrit : > Why are these called "emojis" for mouse buttons rather than just > "

emojis for mouse buttons?

2019-12-31 Thread Philippe Verdy via Unicode
A lot of application need to document their keymap and want to display keys. For now there are emojis for mouses (several variants: 1, 2 or 3 buttons), independently of the button actually pressed. However there's no simple emoji to represent the very common mouse click buttons used in lot of

Re: Encoding the Nsibidi script (African) for writing the Igbo language

2019-11-11 Thread Philippe Verdy via Unicode
Le lun. 11 nov. 2019 à 17:31, Markus Scherer a écrit : > We generally assign the script code when the script is in the pipeline for > a near-future version of Unicode, which demonstrates that it's "a candidate > for encoding". We also want the name of the script to be settled, so that > the

Re: Encoding the Nsibidi script (African) for writing the Igbo language

2019-11-11 Thread Philippe Verdy via Unicode
ion across countries or caused by wars, invasions, diplomacy, or commercial interests) Le lun. 11 nov. 2019 à 17:31, Markus Scherer a écrit : > On Mon, Nov 11, 2019 at 4:03 AM Philippe Verdy via Unicode < > unicode@unicode.org> wrote: > >> But first there's still no code in ISO 159

Encoding the Nsibidi script (African) for writing the Igbo language

2019-11-11 Thread Philippe Verdy via Unicode
Encoding the Nsibidi script (African) for writing the Efik, Ekoi, Ibibio, Igbo language. See this site as an example of use, with links to published educational books. http://blog.nsibiri.org/ Also this online dictionary: https://fr.scribd.com/doc/281219778/Ikpokwu Other links:

Re: comma ellipses

2019-10-07 Thread Philippe Verdy via Unicode
Commas may be used instead of dots by users of French keyboards (it's easier to type the comma, when the dot/full stop requires pressing the SHIFT key). I may be wrong, but I've quire frequently seen commas or semicolons instead of dot/full stops under normal orthography. But the web and notably

Re: Acute/apostrophe diacritic in Võro for palatalized consonants

2019-08-19 Thread Philippe Verdy via Unicode
nd how?) Le mar. 20 août 2019 à 04:17, Philippe Verdy a écrit : > > I'm curious about this statement in English Wikipedia about Võro: > >> Palatalization of consonants is marked with an acute accent (´) or apostrophe ('). In proper typography and in handwriting, the palatalisation

Re: Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Philippe Verdy via Unicode
t;> >> Anshuman Pandey did preliminary research on this in 2011. >> >> http://www.unicode.org/L2/L2011/11144-magar-akkha.pdf >> >> It would be premature to assign an ISO 15924 script code, pending the >> research to determine whether this script should be separately encoded

Re: Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Philippe Verdy via Unicode
g of a new script and disunificaition from Brahmi, may then be more easily justified with their modern use, and probably unified with the remaining use for Eastern Magari). Le lun. 22 juil. 2019 à 19:33, Philippe Verdy a écrit : > > > Le lun. 22 juil. 2019 à 18:43, Ken Whistler a

Re: Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Philippe Verdy via Unicode
Le lun. 22 juil. 2019 à 18:43, Ken Whistler a écrit : > See the entry for "Magar Akkha" on: > > http://linguistics.berkeley.edu/sei/scripts-not-encoded.html > > Anshuman Pandey did preliminary research on this in 2011. > That's what I said: 8 years ago already. >

Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Philippe Verdy via Unicode
According to Ethnolog, the Eastern Magar language (mgp) is written in two scripts: Devanagari and "Akkha". But the "Akkha" script does not seem to have any ISO 15924 code. The Ethnologue currently assigns a private use code (Qabl) for this script. Was the addition delayed due to lack of

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-20 Thread Philippe Verdy via Unicode
r a écrit : > > On 7/17/2019 4:54 PM, Philippe Verdy via Unicode wrote: > > then the Unicode version (age) used for Hieroglyphs should also be > assigned to Hieratic. > > It is already. > > > In fact the ligatures system for the "cursive" Egyptian Hieratic is so >

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-17 Thread Philippe Verdy via Unicode
er studies. And I'm unable to find any non-proprietary (interoperable?) attempt to encode Hieratic, the only attempts being with Hieroglyphs. Le jeu. 18 juil. 2019 à 01:16, Philippe Verdy a écrit : > Sorry I misread (with an automated tool) an old dataset where these "3.0" > ve

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-17 Thread Philippe Verdy via Unicode
Sorry I misread (with an automated tool) an old dataset where these "3.0" versions were indicated in an incorrect form Le jeu. 18 juil. 2019 à 01:07, Philippe Verdy a écrit : > Note also that there are variants registered with Unicode versions (Age) > for symbols, even if the

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-17 Thread Philippe Verdy via Unicode
(even if they may be distinguished in ISO 15924, for example to allow selecting a suitable but preferred sets of fonts, like this is commonly used for Chinese Mandarin, Arabic, Japanese, Korean or Latin) ? Le jeu. 18 juil. 2019 à 00:55, Philippe Verdy a écrit : > The ISO 15924/RA reference page contains

ISO 15924 : missing indication of support for Syriac variants

2019-07-17 Thread Philippe Verdy via Unicode
The ISO 15924/RA reference page contains indication of support in Unicode for variants of various scripts such as Aran, Latf, Latg, Hanb, Hans, Hant:. 160 *Arab* Arabic arabe Arabic 1.1 2004-05-01 161 *Aran* Arabic (Nastaliq variant) arabe (variante nastalique) 1.1 2014-11-15 ... 503 *Hanb* Han

Fwd: Numeric group separators and Bidi

2019-07-09 Thread Philippe Verdy via Unicode
> Well my first feeling was that U+202F should work all the time, but I > found cases where this is not always the case. So this must be bugs in > those renderers. > I think we can attribute these bugs to the fact that this character is insufficiently known, and not even accessible in most input

Re: Numeric group separators and Bidi

2019-07-09 Thread Philippe Verdy via Unicode
e if you need > the look of a narrow space. > > Another possibility is to embed the number in a LRI...PDI block, as > e.g. https://unicode.org/cldr/utility/bidic.jsp does with the "1–3%" > fragment of its default example. > > cheers, > egmont > > On Tue, Jul 9, 2

Numeric group separators and Bidi

2019-07-09 Thread Philippe Verdy via Unicode
Is there a narrow space usable as a numeric group separator, and that also has the same bidi property as digits (i.e. neutral outside the span of digits and separators, but inheriting the implied directionality of the previous digit) ? I can't find a way to use narrow spaces instead of

Re: Unicode "no-op" Character?

2019-07-03 Thread Philippe Verdy via Unicode
of any new character in Unicode. But if your protoclol does not allow any fom of escaping, then it is broken as it cannot transport **all** valid Unicode text. Le mer. 3 juil. 2019 à 10:49, Philippe Verdy a écrit : > Le mer. 3 juil. 2019 à 06:09, Sławomir Osipiuk a > écrit : > >>

Re: Unicode "no-op" Character?

2019-07-03 Thread Philippe Verdy via Unicode
Le mer. 3 juil. 2019 à 06:09, Sławomir Osipiuk a écrit : > I don’t think you understood me at all. I can packetize a string with any > character that is guaranteed not to appear in the text. > Your goal is **impossible** to reach with Unicode. Assume sich character is "added" to the UCS, then

Re: Unicode "no-op" Character?

2019-06-29 Thread Philippe Verdy via Unicode
If you want to "packetize" arbitrarily long Unicode text, you don't need any new magic character. Just prepend your packet with a base character used as a syntaxic delimiter, that does not combine with what follows in any normalization. There's a fine character for that: the TAB control. Except

Symbols of colors used in Portugal for transport

2019-04-27 Thread Philippe Verdy via Unicode
A very useful think to add to Unicode (for colorblind people) ! http://bestinportugal.com/color-add-project-brings-color-identification-to-the-color-blind Is it proposed to add as new symbols ?

Re: Emoji Haggadah

2019-04-19 Thread Philippe Verdy via Unicode
I cannot; definitely it requires first good knowldge of English (to find possible synonyms, plus phonetic approximations, including using abbreviatable words), and Hebrew culture (to guess names and the context). All this text looks completely random and makes no sense otherwise. Le mar. 16 avr.

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-17 Thread Philippe Verdy via Unicode
Le ven. 8 févr. 2019 à 13:56, Egmont Koblinger a écrit : > Philippe, I hate do say it, but at the risk of being impolite, I just > have to. > Resist this idea, I've not been impolite. I just want to show you that terminals are legacy environments that are far behind what is needed for proper

Re: Bidi paragraph direction in terminal emulators

2019-02-14 Thread Philippe Verdy via Unicode
Le mar. 12 févr. 2019 à 14:16, Egmont Koblinger via Unicode < unicode@unicode.org> a écrit : > > There is nothing magic about the grid of cells, and once you introduce > new escape sequences, you might as well truly modernise the terminal. > > The magic about the grid of cells is all the software

Re: Encoding colour (from Re: Encoding italic)

2019-02-11 Thread Philippe Verdy via Unicode
Le dim. 10 févr. 2019 à 02:33, wjgo_10...@btinternet.com via Unicode < unicode@unicode.org> a écrit : > Previously I wrote: > > > A stateful method, though which might be useful for plain text streams > > in some applications, would be to encode as characters some of the > > glyphs for indicating

Re: Encoding italic

2019-02-11 Thread Philippe Verdy via Unicode
Le dim. 10 févr. 2019 à 16:42, James Kass via Unicode a écrit : > > Philippe Verdy wrote, > > >> ...[one font file having both italic and roman]... > > The only case where it happens in real fonts is for the mapping of > > Mathematical Symbols which hav

Re: Bidi paragraph direction in terminal emulators

2019-02-10 Thread Philippe Verdy via Unicode
Le sam. 9 févr. 2019 à 20:55, Egmont Koblinger via Unicode < unicode@unicode.org> a écrit : > Hi Asmus, > > > On quick reading this appears to be a strong argument why such emulators > will > > never be able to be used for certain scripts. Effectively, the model > described works > > well with

Re: Encoding italic

2019-02-10 Thread Philippe Verdy via Unicode
Le dim. 10 févr. 2019 à 05:34, James Kass via Unicode a écrit : > > Martin J. Dürst wrote, > > >> Isn't that already the case if one uses variation sequences to choose > >> between Chinese and Japanese glyphs? > > > > Well, not necessarily. There's nothing prohibiting a font that includes >

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Philippe Verdy via Unicode
Adding a single bit of protection in cell attributes to indicate they are either protected or become transparent (and the rest of the attributes/character field indicates the id of another terminal grid or rendering plugin crfeating its own layer and having its own scrolling state and dimensions)

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Philippe Verdy via Unicode
Le jeu. 7 févr. 2019 à 19:38, Egmont Koblinger a écrit : > As you can see from previous discussions, there's a whole lot of > confusion about the terminology. And it was exactly the subject of my first message sent to this thread ! you probably missed it. > Philippe, with all due respect, I

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Philippe Verdy via Unicode
Le jeu. 7 févr. 2019 à 13:29, Egmont Koblinger a écrit : > Hi Philippe, > > > There's some rules for correct display including with Bidi: > > In what sense are these "rules"? Where are these written, in what kind > of specification or existing practice? > "Rules" are not formally written, they

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-06 Thread Philippe Verdy via Unicode
.) > > Anyway, please also see my previous email; I hope that clarifies a lot > for you, too. > > > cheers, > egmont > > On Tue, Feb 5, 2019 at 5:53 PM Philippe Verdy via Unicode > wrote: > > > > I think that before making any decision we must make some decis

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-05 Thread Philippe Verdy via Unicode
I think that before making any decision we must make some decision about what we mean by "newlines". There are in fact 3 different functions: - (1) soft line breaks (which are used to enforce a maximum display width between paragraph margins): these are equivalent to breakable and compressible

Re: Proposal for BiDi in terminal emulators

2019-02-02 Thread Philippe Verdy via Unicode
Actually not all U+E0020 through U+E007E are "un-deprecated" for this use. For now emoji flags only use: - U+E0041 through U+E005A (mapping to ASCII letters A through Z used in 2-letter ISO3166-1 codes). These are usable in pairs, without requiring any modifier (and only for ISO3166-1 registered

Re: Encoding italic

2019-02-01 Thread Philippe Verdy via Unicode
the proposal would contradict the goals of variation selectors and would pollute ther variation sequences registry (possibly even creating conflicts). And if we admit it for italics, than another VSn will be dedicated to bold, and another for monospace, and finally many would follow for various

Re: Encoding italic

2019-01-28 Thread Philippe Verdy via Unicode
ter. Le lun. 28 janv. 2019 à 03:03, James Kass via Unicode a écrit : > > On 2019-01-27 11:44 PM, Philippe Verdy wrote: > > > You're not very explicit about the Tag encoding you use for these > styles. > > This <b>bold</b> new concept was not mine. When I test

Re: Encoding italic

2019-01-27 Thread Philippe Verdy via Unicode
You're not very explicit about the Tag encoding you use for these styles. Of course it must not be a language tag so the introducer is not U+E0001, or a cancel-all tag so it is not prefixed by U+E007F It cannot also use letter-like, digit-like and hyphen-like tag characters for its introduction.

Re: Ancient Greek apostrophe marking elision

2019-01-27 Thread Philippe Verdy via Unicode
For Volapük, it looks much more like U+02BE (right half ring modifier letter) than like U+02BC (apostrophe "modifier" letter). according to the PDF on https://archive.org/details/cu31924027111453/page/n12 The half ring makes a clear distinction with the regular apostrophe (for elisions) or

Re: Encoding italic (was: A last missing link)

2019-01-17 Thread Philippe Verdy via Unicode
If encoding italics means reencoding the normal linguistic usage, it is no ! We already have the nightmares caused by partial encoding of Latin and Greek (als a few Hebrew characters) for maths notations or IPA notations, but they are restricted to a well delimited scope of use and subset, and at

Re: NNBSP (was: A last missing link for interoperable representation)

2019-01-17 Thread Philippe Verdy via Unicode
Le jeu. 17 janv. 2019 à 05:01, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > On 16/01/2019 21:53, Richard Wordingham via Unicode wrote: > > > > On Tue, 15 Jan 2019 13:25:06 +0100 > > Philippe Verdy via Unicode wrote: > > > >> If yo

Re: A last missing link for interoperable representation

2019-01-15 Thread Philippe Verdy via Unicode
Note that even if this NNBSP character is not mapped in a font, it should be rendered correctly with all modern renderers (the mapping is necessary only when a font design wants to tune its metrics, because its width varies between 1/8 and 1/6 em (the narrow space is a bit narrower in traditional

Re: A last missing link for interoperable representation

2019-01-15 Thread Philippe Verdy via Unicode
Le lun. 14 janv. 2019 à 20:25, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > On 14/01/2019 06:08, James Kass via Unicode wrote: > > > > Marcel Schneider wrote, > > > >> There is a crazy typeface out there, misleadingly called 'Courier > >> New', as if the foundry didn’t

Re: UCA unnecessary collation weight 0000

2018-11-04 Thread Philippe Verdy via Unicode
erties, algorithms, CLDR, ICU) that have a higher > benefit. > > You can continue flogging this horse all you want, but I'm muting this > thread (and I suspect I'm not the only one). > > Mark > > > On Sun, Nov 4, 2018 at 2:37 AM Philippe Verdy via Unicode < > unicode@un

Re: Encoding

2018-11-04 Thread Philippe Verdy via Unicode
needed in alphabets of actual natural languages, or as possibly new IPA symbols), and without using the styling tricks (of HTML/CSS, or of word processor documents, spreadsheets, presentation documents allowing "'rich text" formats on top of "plain text") which are

Re: Encoding (was: Re: A sign/abbreviation for "magister")

2018-11-04 Thread Philippe Verdy via Unicode
Note that I actually propose not just one rendering for the but two possible variants (that would be equally valid withou preference). Use it after any base cluster (including with diacritics if needed, like combining underlines). - the first one can be to render the previous cluster as

Re: UCA unnecessary collation weight 0000

2018-11-03 Thread Philippe Verdy via Unicode
Le ven. 2 nov. 2018 à 22:27, Ken Whistler a écrit : > > On 11/2/2018 10:02 AM, Philippe Verdy via Unicode wrote: > > I was replying not about the notational repreentation of the DUCET data > table (using [....] unnecessarily) but about the text of UTR#10 itself. > Wh

Re: UCA unnecessary collation weight 0000

2018-11-03 Thread Philippe Verdy via Unicode
Le ven. 2 nov. 2018 à 22:27, Ken Whistler a écrit : > > On 11/2/2018 10:02 AM, Philippe Verdy via Unicode wrote: > > I was replying not about the notational repreentation of the DUCET data > table (using [....] unnecessarily) but about the text of UTR#10 itself. > Wh

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
It should be noted that the algorithmic complexity for this NFLD normalization ("legacy") is exactly the same as for NFKD ("compatibility"). However NFLD is versioned (like also NFLC), so NFLD can take a second parameter: the maximum Unicode version which can be used to filter which decomposition

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
Le sam. 3 nov. 2018 à 23:36, Philippe Verdy a écrit : > - this new decomposition mapping file for NFLC and NFLD, where NFLC is >> defined to be NFC(NFLD), has some stability requirements and it must be >> warrantied that NFD(NFLD) = NFD >> > Oops! fix my typo: it must be

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
> > Unlike NFKC and NFKD, the NFLC and NFLD would be an extensible superset > based on MUTABLE character properties (this can also be "decompositions > mappings" except that once a character is added to the new property file, > they won't be removed, and can have some stability as well, where the

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
llow correcting past errors in the standard. This file should have this form: # deprecated codepoint(s) ; new preferred sequence ; Unicode version ins which it was deprecated 101234 ; 101230 0300... ; 10.0 This file can also be used to deprecate some old variation sequences, or some old cluster

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
s and still preserve distinction between contrasting pairs, but NOT as a way to encode non-semantic styles), and character properties to allow efficient processing. Le sam. 3 nov. 2018 à 21:02, Philippe Verdy a écrit : > As well the separate encoding of mathematical variants could have been &

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
small chart per base character, listing them simply ordered by "VSn" value. All what Unicode publishes is only a mere data list with some names (not enough for most users to be ware that variations can be encoded explicitly and compliantly) Le sam. 3 nov. 2018 à 20:41, Philippe Ver

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
Le ven. 2 nov. 2018 à 20:01, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > On 02/11/2018 17:45, Philippe Verdy via Unicode wrote: > [quoted mail] > > > > Using variation selectors is only appropriate for these existing > > (preencoded)

Re: UCA unnecessary collation weight 0000

2018-11-02 Thread Philippe Verdy via Unicode
bin/collation.html and turn on "raw >>> collation elements" and "sort keys" to see the transformed collation >>> elements (from the DUCET + CLDR) and the resulting sort keys. >>> >>> a =>[29,05,_05] => 29 , 05 , 05 . >>> a\u0

Re: A sign/abbreviation for "magister"

2018-11-02 Thread Philippe Verdy via Unicode
Le ven. 2 nov. 2018 à 16:20, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > That seems to me a regression, after the front has moved in favor of > recognizing Latin script needs preformatted superscript. The use case is > clear, as we have ª, º, and n° with degree sign, and so on

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
to map the patterns so that the encoded secondary weight will be readable valid UTF-8. The fourth level, started by the mark "000" can use the pattern "001" to encode the most frequent minimum quaternary weight, and patterns "010" to "011" followed by other bi

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
ans editing the file, so they don't need to wonder what is the level of the first indicated weight or remember what is the minimum weight for that level. But the DUCET table is actually generated by a machine and processed by machines. Le jeu. 1 nov. 2018 à 21:57, Philippe Verdy a écrit : > I

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
d minimum tertiary weight. But note that 0020 is kept in two places as they are followed by a higher weight 0021. This is general for any tailored collation (not just the DUCET). Le jeu. 1 nov. 2018 à 21:42, Philippe Verdy a écrit : > The is there in the UCA only because the DUCET is publ

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
The is there in the UCA only because the DUCET is published in a format that uses it, but here also this format is useless: you never need any [.], or [..] in the DUCET table as well. Instead the DUCET just needs to indicate what is the minimum weight assigned for every level

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
Le jeu. 1 nov. 2018 à 21:31, Philippe Verdy a écrit : > so you can use these two last functions to write the first one: > > bool isIgnorable(int level, string element) { > return getLevel(getWeightAt(element, 0)) > getMinWeight(level); > } > correction: return g

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
Le jeu. 1 nov. 2018 à 21:08, Markus Scherer a écrit : > When you want fast string comparison, the zero weights are useful for >> processing -- and you don't actually assemble a sort key. >> > And no, I absolutely no case where any weight is useful during processing, it does not distinguish

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
I'm not speaking just about how collation keys will finally be stored (as uint16 or bytes, or sequences of bits with variable length); I'm just refering to the sequence of weights you generate. You absolutely NEVER need ANYWHERE in the UCA algorithm any weight, not even during processing, or

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
the collation key. This gives: Figure 3. Comparison of Sort Keys <http://unicode.org/reports/tr10/#Comparison_Of_Sort_Keys_Table> StringSort Key 1 cab *0706* 06D9 06EE 2 Cab *0706* 06D9 06EE *0008* 3 cáb *0706* 06D9 06EE 0020 0020 *0021* 4 dab *0712* 06D9 06EE See the reduction ! Le jeu.

UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
I just remarked that there's absolutely NO utility of the collation weight anywhere in the algorithm. For example in UTR #10, section 3.3.1 gives a collection element : [..0021.0002] for COMBINING GRAVE ACCENT. However it can also be simply: [.0021.0002] for a simple reason: the

Re: A sign/abbreviation for "magister"

2018-10-31 Thread Philippe Verdy via Unicode
As is "Mgr" for Monseigneur in French ("Mgr" without superscripts makes little sense, and if "Mr" is sometimes found as an abbreviation for "Monsieur", its standard abbreviation is "M.", and its plural "Messieurs" is noted "MM" without any abbreviation dot or superscript, but normally never as

Re: A sign/abbreviation for "magister"

2018-10-29 Thread Philippe Verdy via Unicode
For the case of "Mister" vs. "Magister", the (double) underlining is not just a stylistic option but conveys semantics as an explicit abbreviation mark ! We are here at the line between what is pure visual encoding (e.g. using superscript letters), and logical encoding (as done eveywhere else in

Re: A sign/abbreviation for "magister"

2018-10-28 Thread Philippe Verdy via Unicode
rk will just render it at end of the sequence as a usual square or rectangular "tofu"; those that recognize it as a combining character but no support for it, will render the usual dotted square (meaning "unsupported combining mark", to distinguish from the meaning as if t

Re: A sign/abbreviation for "magister"

2018-10-28 Thread Philippe Verdy via Unicode
Le dim. 28 oct. 2018 à 18:28, Janusz S. Bień a écrit : > On Sun, Oct 28 2018 at 15:19 +0100, Philippe Verdy via Unicode wrote: > > Given the "squiggle" below letters are actually gien distinctive > > semantics, I think it should be encoded a combining character (to

Re: A sign/abbreviation for "magister"

2018-10-28 Thread Philippe Verdy via Unicode
Given the "squiggle" below letters are actually gien distinctive semantics, I think it should be encoded a combining character (to be written not after a "superscript" but after any normal base letter, possibly with other combining characters, or CGJ if needed because of the compatibility

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
04:21, Garth Wallace via Unicode a écrit : > I learned that one as a kid, as the "pigpen cipher". I'm not aware of any > numerological significance (which is easy enough to "find" in anything). > > On Sat, Oct 27, 2018 at 7:43 PM Philippe Verdy via Unicode < >

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
(or Hebrew) letters to this alphabet varies (just like Braille symbols depending on languages/scripts) It may have extensions (like Braille outside its basic 2x3 patterns of dots), such as a second dot in squares, horizontally as "··" or vertically as ":" Le dim. 28 oct. 2018 à

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
;" > - "X" becomes approximately "\/" > - "J" is noted like "I" as a square, or distinctly approximately as ">" > with a central dot > > The 3x3 grid had some esoterical meaning based on numerology (a legend now > propaged

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
rid had some esoterical meaning based on numerology (a legend now propaged by scientology). Le dim. 28 oct. 2018 à 02:59, Philippe Verdy a écrit : > Do you speak about this one? > https://www.magisterdaire.com/magister-symbol-black-sq/ > It looks like a graphic personal signature for the a

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
Do you speak about this one? https://www.magisterdaire.com/magister-symbol-black-sq/ It looks like a graphic personal signature for the author of this esoteric book, even if it looks like an interesting composition of several of our existing Unicode symbols, glued together in a vertical ligature,

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
Le sam. 27 oct. 2018 à 15:06, Asmus Freytag via Unicode a écrit : > First question is: how do you interpret the symbol? For me it is > definitely the capital M followed by the superscript "r" (written in an > old style no longer used in Poland), but there is something below the > superscript. It

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
g explicitly modified to suit an embedding protocol. > > And certainly the first sentence in this section isn’t intended to be > taken without the context of the rest of the section. > > > > tex > > > > > > > > *From:* Philippe Verdy [mailto:verd...@wanadoo.fr]

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
frequently needed. Le lun. 15 oct. 2018 à 13:57, Philippe Verdy a écrit : > If you want an example where padding with "=" is not used at all, > - look into URL-shortening schemes > - look into database fields or data input forms and numerous data formats > where the "=&q

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
e any padding at all, letting the decoder discard the trailing bits themselves at end of the encoded stream. Le lun. 15 oct. 2018 à 13:24, Philippe Verdy a écrit : > Also the rationale for supporting "unnecessary" whitespace is found in > MIME's version of Base64, also

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
protocol. > > > > Tex > > > > > > > > > > *From:* Unicode [mailto:unicode-boun...@unicode.org] *On Behalf Of *Philippe > Verdy via Unicode > *Sent:* Sunday, October 14, 2018 1:41 AM > *To:* Adam Borowski > *Cc:* unicode Unicode Discussion >

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
t treated as part of the encoding. > > Maybe we differ on define where the encoding begins and ends, and where > higher level protocols prescribe how they are embedded within the protocol. > > > > Tex > > > > > > > > > > *From:* Unicode [mailto:unicode

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-14 Thread Philippe Verdy via Unicode
Le dim. 14 oct. 2018 à 21:21, Doug Ewell via Unicode a écrit : > Steffen Nurpmeso wrote: > > > Base64 is defined in RFC 2045 (Multipurpose Internet Mail Extensions > > (MIME) Part One: Format of Internet Message Bodies). > > Base64 is defined in RFC 4648, "The Base16, Base32, and Base64 Data >

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-14 Thread Philippe Verdy via Unicode
It's also interesting to look at https://tools.ietf.org/html/rfc3501 - which defines (for IMAP v4) another "BASE64" encoding, - and also defines a "Modified UTF-7" encoding using it, deviating from Unicode's definition of UTF-7, - and adding other requirements (which forbids alternate encodings

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-14 Thread Philippe Verdy via Unicode
e conforming to Unicode, provided they preserve each Unicode scalar value, or at least the code point identity because an encoder/decoder is not required to support non-character code points such as surrogates or U+FFFE), where Base64 may be used for internally generated octets-streams. Le dim. 14

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-13 Thread Philippe Verdy via Unicode
Le sam. 13 oct. 2018 à 18:58, Steffen Nurpmeso via Unicode < unicode@unicode.org> a écrit : > Philippe Verdy via Unicode wrote in w9+jearw4ghyk...@mail.gmail.com>: > |You forget that Base64 (as used in MIME) does not follow these rules \ > |as it allows multiple d

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-13 Thread Philippe Verdy via Unicode
on the allowed set of characters, and on their maximum line lengths): Base64_Encode[Base64_Decode[t]] = t may be false. Le sam. 13 oct. 2018 à 16:45, Philippe Verdy a écrit : > You forget that Base64 (as used in MIME) does not follow these rules as it > allows multiple different encodings for the same

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-13 Thread Philippe Verdy via Unicode
You forget that Base64 (as used in MIME) does not follow these rules as it allows multiple different encodings for the same source binary. MIME actually splits a binary object into multiple fragments at random positions, and then encodes these fragments separately. Also MIME uses an extension of

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-12 Thread Philippe Verdy via Unicode
I also think the reverse is also true ! Decoding a Base64 entity does not warranty it will return valid text in any known encoding. So Unicode normalization of the output cannot apply. Even if it represents text, nothing indicates that the result will be encoded with some Unicode encoding form

Re: Dealing with Georgian capitalization in programming languages

2018-10-02 Thread Philippe Verdy via Unicode
I see no easy way to convert ALL UPPERCASE text with consistant casing as there's no rule, except by using dictionnary lookups. In reality data should be input using default casing (as in dictionnary entries), independantly of their position in sentences, paragraphs or titles, and the contextual

Re: Shortcuts question

2018-09-17 Thread Philippe Verdy via Unicode
Note: CLDR concentrates on keyboard layout for text input. Layouts for other functions (such as copy-pasting, gaming controls) are completely different (and not necessarily bound directly to layouts for text, as they may also have their own dedicated physical keys or users can reprogram their

Re: Shortcuts question

2018-09-16 Thread Philippe Verdy via Unicode
. 16 sept. 2018 à 14:18, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > On 15/09/18 15:36, Philippe Verdy wrote: > […] > > So yes all control keys are potentially localisable to work best with > the base layout anre remaining mnemonic; > > but the physic

Re: Shortcuts question

2018-09-15 Thread Philippe Verdy via Unicode
Le ven. 7 sept. 2018 à 05:43, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > On 07/09/18 02:32 Shriramana Sharma via Unicode wrote: > > > > Hello. This may be slightly OT for this list but I'm asking it here as > it concerns computer usage with multiple scripts and i18n: > > It

Re: Unicode String Models

2018-09-11 Thread Philippe Verdy via Unicode
No 0xF8..0xFF are not used at all in UTF-8; but U+00F8..U+00FF really **do** have UTF-8 encodings (using two bytes). The only safe way to represent arbitrary bytes within strings when they are not valid UTF-8 is to use invalid UTF-8 sequences, i.e by using a "UTF-8-like" private extension of

  1   2   3   4   5   6   7   8   9   10   >