People name are NOT transliterated freely. It's up to each person to
document his romanized name, it should not be invented by automatic
processes. And frequently the romanized name (officialized) does noit match
the original name in another script: this is very frequent for Chinese
people, as
It is possible with some other markup languages, including HTML by using
ruby notation and other interlinear notations for creating special vertical
layouts inside an horizontal line.
There are difficulties however caused by line wraps which may occur before
the vertical layout, or even inside it
You seem to have never seen how translation packages work and are used in
common projects (not just CLDR, but you could find them as well in
Wikimedia projects, or translation packages for lot of open source
packages).
The purpose is to allow translating the UI of these applications for user's
pointing up inside)
> > MOUSE SCROLL DOWN (mouse with middle button black and white triangle
> > pointing down inside)
> >
> > These characters are pretty useful in software manuals, training
> > materials and user interfaces.
> >
> > Happy New Year,
> >
Playing with the fiolling of the middle cell to mean a double click is a
bad idea, it would be better to add one or two rounded borders separated
from the button, or simply display two icons in sequence for a double
click).
Note that the glyphs do not necessarily have to show a mouse, it could as
t in three cells by
horizontal and vertical strokes, and one of the three cells filled
(representing the wire or the wireless waves is not necessary).
Le mar. 31 déc. 2019 à 14:57, Shriramana Sharma a
écrit :
> Why are these called "emojis" for mouse buttons rather than just
> "
A lot of application need to document their keymap and want to display keys.
For now there are emojis for mouses (several variants: 1, 2 or 3 buttons),
independently of the button actually pressed.
However there's no simple emoji to represent the very common mouse click
buttons used in lot of
Le lun. 11 nov. 2019 à 17:31, Markus Scherer a
écrit :
> We generally assign the script code when the script is in the pipeline for
> a near-future version of Unicode, which demonstrates that it's "a candidate
> for encoding". We also want the name of the script to be settled, so that
> the
ion across
countries or caused by wars, invasions, diplomacy, or commercial interests)
Le lun. 11 nov. 2019 à 17:31, Markus Scherer a
écrit :
> On Mon, Nov 11, 2019 at 4:03 AM Philippe Verdy via Unicode <
> unicode@unicode.org> wrote:
>
>> But first there's still no code in ISO 159
Encoding the Nsibidi script (African) for writing the Efik, Ekoi, Ibibio,
Igbo language.
See this site as an example of use, with links to published educational
books.
http://blog.nsibiri.org/
Also this online dictionary:
https://fr.scribd.com/doc/281219778/Ikpokwu
Other links:
Commas may be used instead of dots by users of French keyboards (it's
easier to type the comma, when the dot/full stop requires pressing the
SHIFT key).
I may be wrong, but I've quire frequently seen commas or semicolons instead
of dot/full stops under normal orthography.
But the web and notably
nd how?)
Le mar. 20 août 2019 à 04:17, Philippe Verdy a écrit :
>
> I'm curious about this statement in English Wikipedia about Võro:
>
>> Palatalization of consonants is marked with an acute accent (´) or
apostrophe ('). In proper typography and in handwriting, the palatalisation
t;>
>> Anshuman Pandey did preliminary research on this in 2011.
>>
>> http://www.unicode.org/L2/L2011/11144-magar-akkha.pdf
>>
>> It would be premature to assign an ISO 15924 script code, pending the
>> research to determine whether this script should be separately encoded
g of a new script and disunificaition from Brahmi, may then be more
easily justified with their modern use, and probably unified with the
remaining use for Eastern Magari).
Le lun. 22 juil. 2019 à 19:33, Philippe Verdy a écrit :
>
>
> Le lun. 22 juil. 2019 à 18:43, Ken Whistler a
Le lun. 22 juil. 2019 à 18:43, Ken Whistler a
écrit :
> See the entry for "Magar Akkha" on:
>
> http://linguistics.berkeley.edu/sei/scripts-not-encoded.html
>
> Anshuman Pandey did preliminary research on this in 2011.
>
That's what I said: 8 years ago already.
>
According to Ethnolog, the Eastern Magar language (mgp) is written in two
scripts: Devanagari and "Akkha".
But the "Akkha" script does not seem to have any ISO 15924 code.
The Ethnologue currently assigns a private use code (Qabl) for this script.
Was the addition delayed due to lack of
r a
écrit :
>
> On 7/17/2019 4:54 PM, Philippe Verdy via Unicode wrote:
>
> then the Unicode version (age) used for Hieroglyphs should also be
> assigned to Hieratic.
>
> It is already.
>
>
> In fact the ligatures system for the "cursive" Egyptian Hieratic is so
>
er studies.
And I'm unable to find any non-proprietary (interoperable?) attempt to
encode Hieratic, the only attempts being with Hieroglyphs.
Le jeu. 18 juil. 2019 à 01:16, Philippe Verdy a écrit :
> Sorry I misread (with an automated tool) an old dataset where these "3.0"
> ve
Sorry I misread (with an automated tool) an old dataset where these "3.0"
versions were indicated in an incorrect form
Le jeu. 18 juil. 2019 à 01:07, Philippe Verdy a écrit :
> Note also that there are variants registered with Unicode versions (Age)
> for symbols, even if the
(even if they may be
distinguished in ISO 15924, for example to allow selecting a suitable but
preferred sets of fonts, like this is commonly used for Chinese Mandarin,
Arabic, Japanese, Korean or Latin) ?
Le jeu. 18 juil. 2019 à 00:55, Philippe Verdy a écrit :
> The ISO 15924/RA reference page contains
The ISO 15924/RA reference page contains indication of support in Unicode
for variants of various scripts such as Aran, Latf, Latg, Hanb, Hans, Hant:.
160 *Arab* Arabic arabe Arabic 1.1 2004-05-01
161 *Aran* Arabic (Nastaliq variant) arabe (variante nastalique) 1.1
2014-11-15
...
503 *Hanb* Han
> Well my first feeling was that U+202F should work all the time, but I
> found cases where this is not always the case. So this must be bugs in
> those renderers.
>
I think we can attribute these bugs to the fact that this character is
insufficiently known, and not even accessible in most input
e if you need
> the look of a narrow space.
>
> Another possibility is to embed the number in a LRI...PDI block, as
> e.g. https://unicode.org/cldr/utility/bidic.jsp does with the "1–3%"
> fragment of its default example.
>
> cheers,
> egmont
>
> On Tue, Jul 9, 2
Is there a narrow space usable as a numeric group separator, and that also
has the same bidi property as digits (i.e. neutral outside the span of
digits and separators, but inheriting the implied directionality of the
previous digit) ?
I can't find a way to use narrow spaces instead of
of any new character in Unicode. But if your
protoclol does not allow any fom of escaping, then it is broken as it
cannot transport **all** valid Unicode text.
Le mer. 3 juil. 2019 à 10:49, Philippe Verdy a écrit :
> Le mer. 3 juil. 2019 à 06:09, Sławomir Osipiuk a
> écrit :
>
>>
Le mer. 3 juil. 2019 à 06:09, Sławomir Osipiuk a
écrit :
> I don’t think you understood me at all. I can packetize a string with any
> character that is guaranteed not to appear in the text.
>
Your goal is **impossible** to reach with Unicode. Assume sich character is
"added" to the UCS, then
If you want to "packetize" arbitrarily long Unicode text, you don't need
any new magic character. Just prepend your packet with a base character
used as a syntaxic delimiter, that does not combine with what follows in
any normalization.
There's a fine character for that: the TAB control. Except
A very useful think to add to Unicode (for colorblind people) !
http://bestinportugal.com/color-add-project-brings-color-identification-to-the-color-blind
Is it proposed to add as new symbols ?
I cannot; definitely it requires first good knowldge of English (to find
possible synonyms, plus phonetic approximations, including using
abbreviatable words), and Hebrew culture (to guess names and the context).
All this text looks completely random and makes no sense otherwise.
Le mar. 16 avr.
Le ven. 8 févr. 2019 à 13:56, Egmont Koblinger a écrit :
> Philippe, I hate do say it, but at the risk of being impolite, I just
> have to.
>
Resist this idea, I've not been impolite. I just want to show you that
terminals are legacy environments that are far behind what is needed for
proper
Le mar. 12 févr. 2019 à 14:16, Egmont Koblinger via Unicode <
unicode@unicode.org> a écrit :
> > There is nothing magic about the grid of cells, and once you introduce
> new escape sequences, you might as well truly modernise the terminal.
>
> The magic about the grid of cells is all the software
Le dim. 10 févr. 2019 à 02:33, wjgo_10...@btinternet.com via Unicode <
unicode@unicode.org> a écrit :
> Previously I wrote:
>
> > A stateful method, though which might be useful for plain text streams
> > in some applications, would be to encode as characters some of the
> > glyphs for indicating
Le dim. 10 févr. 2019 à 16:42, James Kass via Unicode
a écrit :
>
> Philippe Verdy wrote,
>
> >> ...[one font file having both italic and roman]...
> > The only case where it happens in real fonts is for the mapping of
> > Mathematical Symbols which hav
Le sam. 9 févr. 2019 à 20:55, Egmont Koblinger via Unicode <
unicode@unicode.org> a écrit :
> Hi Asmus,
>
> > On quick reading this appears to be a strong argument why such emulators
> will
> > never be able to be used for certain scripts. Effectively, the model
> described works
> > well with
Le dim. 10 févr. 2019 à 05:34, James Kass via Unicode
a écrit :
>
> Martin J. Dürst wrote,
>
> >> Isn't that already the case if one uses variation sequences to choose
> >> between Chinese and Japanese glyphs?
> >
> > Well, not necessarily. There's nothing prohibiting a font that includes
>
Adding a single bit of protection in cell attributes to indicate they are
either protected or become transparent (and the rest of the
attributes/character field indicates the id of another terminal grid or
rendering plugin crfeating its own layer and having its own scrolling state
and dimensions)
Le jeu. 7 févr. 2019 à 19:38, Egmont Koblinger a écrit :
> As you can see from previous discussions, there's a whole lot of
> confusion about the terminology.
And it was exactly the subject of my first message sent to this thread !
you probably missed it.
> Philippe, with all due respect, I
Le jeu. 7 févr. 2019 à 13:29, Egmont Koblinger a écrit :
> Hi Philippe,
>
> > There's some rules for correct display including with Bidi:
>
> In what sense are these "rules"? Where are these written, in what kind
> of specification or existing practice?
>
"Rules" are not formally written, they
.)
>
> Anyway, please also see my previous email; I hope that clarifies a lot
> for you, too.
>
>
> cheers,
> egmont
>
> On Tue, Feb 5, 2019 at 5:53 PM Philippe Verdy via Unicode
> wrote:
> >
> > I think that before making any decision we must make some decis
I think that before making any decision we must make some decision about
what we mean by "newlines". There are in fact 3 different functions:
- (1) soft line breaks (which are used to enforce a maximum display width
between paragraph margins): these are equivalent to breakable and
compressible
Actually not all U+E0020 through U+E007E are "un-deprecated" for this use.
For now emoji flags only use:
- U+E0041 through U+E005A (mapping to ASCII letters A through Z used in
2-letter ISO3166-1 codes). These are usable in pairs, without requiring any
modifier (and only for ISO3166-1 registered
the proposal would contradict the goals of variation selectors and would
pollute ther variation sequences registry (possibly even creating
conflicts). And if we admit it for italics, than another VSn will be
dedicated to bold, and another for monospace, and finally many would follow
for various
ter.
Le lun. 28 janv. 2019 à 03:03, James Kass via Unicode
a écrit :
>
> On 2019-01-27 11:44 PM, Philippe Verdy wrote:
>
> > You're not very explicit about the Tag encoding you use for these
> styles.
>
> This <b>bold</b> new concept was not mine. When I test
You're not very explicit about the Tag encoding you use for these styles.
Of course it must not be a language tag so the introducer is not U+E0001,
or a cancel-all tag so it is not prefixed by U+E007F
It cannot also use letter-like, digit-like and hyphen-like tag characters
for its introduction.
For Volapük, it looks much more like U+02BE (right half ring modifier
letter)
than like U+02BC (apostrophe "modifier" letter).
according to the PDF on
https://archive.org/details/cu31924027111453/page/n12
The half ring makes a clear distinction with the regular apostrophe (for
elisions) or
If encoding italics means reencoding the normal linguistic usage, it is no
! We already have the nightmares caused by partial encoding of Latin and
Greek (als a few Hebrew characters) for maths notations or IPA notations,
but they are restricted to a well delimited scope of use and subset, and at
Le jeu. 17 janv. 2019 à 05:01, Marcel Schneider via Unicode <
unicode@unicode.org> a écrit :
> On 16/01/2019 21:53, Richard Wordingham via Unicode wrote:
> >
> > On Tue, 15 Jan 2019 13:25:06 +0100
> > Philippe Verdy via Unicode wrote:
> >
> >> If yo
Note that even if this NNBSP character is not mapped in a font, it should
be rendered correctly with all modern renderers (the mapping is necessary
only when a font design wants to tune its metrics, because its width varies
between 1/8 and 1/6 em (the narrow space is a bit narrower in traditional
Le lun. 14 janv. 2019 à 20:25, Marcel Schneider via Unicode <
unicode@unicode.org> a écrit :
> On 14/01/2019 06:08, James Kass via Unicode wrote:
> >
> > Marcel Schneider wrote,
> >
> >> There is a crazy typeface out there, misleadingly called 'Courier
> >> New', as if the foundry didn’t
erties, algorithms, CLDR, ICU) that have a higher
> benefit.
>
> You can continue flogging this horse all you want, but I'm muting this
> thread (and I suspect I'm not the only one).
>
> Mark
>
>
> On Sun, Nov 4, 2018 at 2:37 AM Philippe Verdy via Unicode <
> unicode@un
needed in alphabets of
actual natural languages, or as possibly new IPA symbols), and without
using the styling tricks (of HTML/CSS, or of word processor documents,
spreadsheets, presentation documents allowing "'rich text" formats on top
of "plain text") which are
Note that I actually propose not just one rendering for the but two possible variants (that would be equally valid
withou preference). Use it after any base cluster (including with
diacritics if needed, like combining underlines).
- the first one can be to render the previous cluster as
Le ven. 2 nov. 2018 à 22:27, Ken Whistler a écrit :
>
> On 11/2/2018 10:02 AM, Philippe Verdy via Unicode wrote:
>
> I was replying not about the notational repreentation of the DUCET data
> table (using [....] unnecessarily) but about the text of UTR#10 itself.
> Wh
Le ven. 2 nov. 2018 à 22:27, Ken Whistler a écrit :
>
> On 11/2/2018 10:02 AM, Philippe Verdy via Unicode wrote:
>
> I was replying not about the notational repreentation of the DUCET data
> table (using [....] unnecessarily) but about the text of UTR#10 itself.
> Wh
It should be noted that the algorithmic complexity for this NFLD
normalization ("legacy") is exactly the same as for NFKD ("compatibility").
However NFLD is versioned (like also NFLC), so NFLD can take a second
parameter: the maximum Unicode version which can be used to filter which
decomposition
Le sam. 3 nov. 2018 à 23:36, Philippe Verdy a écrit :
> - this new decomposition mapping file for NFLC and NFLD, where NFLC is
>> defined to be NFC(NFLD), has some stability requirements and it must be
>> warrantied that NFD(NFLD) = NFD
>>
> Oops! fix my typo: it must be
>
> Unlike NFKC and NFKD, the NFLC and NFLD would be an extensible superset
> based on MUTABLE character properties (this can also be "decompositions
> mappings" except that once a character is added to the new property file,
> they won't be removed, and can have some stability as well, where the
llow correcting past errors in the standard. This file should
have this form:
# deprecated codepoint(s) ; new preferred sequence ; Unicode version ins
which it was deprecated
101234 ; 101230 0300... ; 10.0
This file can also be used to deprecate some old variation sequences, or
some old cluster
s and
still preserve distinction between contrasting pairs, but NOT as a way to
encode non-semantic styles), and character properties to allow efficient
processing.
Le sam. 3 nov. 2018 à 21:02, Philippe Verdy a écrit :
> As well the separate encoding of mathematical variants could have been
&
small chart per base character, listing
them simply ordered by "VSn" value. All what Unicode publishes is only a
mere data list with some names (not enough for most users to be ware that
variations can be encoded explicitly and compliantly)
Le sam. 3 nov. 2018 à 20:41, Philippe Ver
Le ven. 2 nov. 2018 à 20:01, Marcel Schneider via Unicode <
unicode@unicode.org> a écrit :
> On 02/11/2018 17:45, Philippe Verdy via Unicode wrote:
> [quoted mail]
> >
> > Using variation selectors is only appropriate for these existing
> > (preencoded)
bin/collation.html and turn on "raw
>>> collation elements" and "sort keys" to see the transformed collation
>>> elements (from the DUCET + CLDR) and the resulting sort keys.
>>>
>>> a =>[29,05,_05] => 29 , 05 , 05 .
>>> a\u0
Le ven. 2 nov. 2018 à 16:20, Marcel Schneider via Unicode <
unicode@unicode.org> a écrit :
> That seems to me a regression, after the front has moved in favor of
> recognizing Latin script needs preformatted superscript. The use case is
> clear, as we have ª, º, and n° with degree sign, and so on
to map the patterns so
that the encoded secondary weight will be readable valid UTF-8.
The fourth level, started by the mark "000" can use the pattern "001" to
encode the most frequent minimum quaternary weight, and patterns "010" to
"011" followed by other bi
ans editing the file, so they don't need to
wonder what is the level of the first indicated weight or remember what is
the minimum weight for that level.
But the DUCET table is actually generated by a machine and processed by
machines.
Le jeu. 1 nov. 2018 à 21:57, Philippe Verdy a écrit :
> I
d minimum
tertiary weight.
But note that 0020 is kept in two places as they are followed by a higher
weight 0021. This is general for any tailored collation (not just the
DUCET).
Le jeu. 1 nov. 2018 à 21:42, Philippe Verdy a écrit :
> The is there in the UCA only because the DUCET is publ
The is there in the UCA only because the DUCET is published in a
format that uses it, but here also this format is useless: you never need
any [.], or [..] in the DUCET table as well. Instead the DUCET
just needs to indicate what is the minimum weight assigned for every level
Le jeu. 1 nov. 2018 à 21:31, Philippe Verdy a écrit :
> so you can use these two last functions to write the first one:
>
> bool isIgnorable(int level, string element) {
> return getLevel(getWeightAt(element, 0)) > getMinWeight(level);
> }
>
correction:
return g
Le jeu. 1 nov. 2018 à 21:08, Markus Scherer a
écrit :
> When you want fast string comparison, the zero weights are useful for
>> processing -- and you don't actually assemble a sort key.
>>
>
And no, I absolutely no case where any weight is useful during
processing, it does not distinguish
I'm not speaking just about how collation keys will finally be stored (as
uint16 or bytes, or sequences of bits with variable length); I'm just
refering to the sequence of weights you generate.
You absolutely NEVER need ANYWHERE in the UCA algorithm any weight,
not even during processing, or
the collation key.
This gives:
Figure 3. Comparison of Sort Keys
<http://unicode.org/reports/tr10/#Comparison_Of_Sort_Keys_Table>
StringSort Key
1 cab *0706* 06D9 06EE
2 Cab *0706* 06D9 06EE *0008*
3 cáb *0706* 06D9 06EE 0020 0020 *0021*
4 dab *0712* 06D9 06EE
See the reduction !
Le jeu.
I just remarked that there's absolutely NO utility of the collation weight
anywhere in the algorithm.
For example in UTR #10, section 3.3.1 gives a collection element :
[..0021.0002]
for COMBINING GRAVE ACCENT. However it can also be simply:
[.0021.0002]
for a simple reason: the
As is "Mgr" for Monseigneur in French ("Mgr" without
superscripts makes little sense, and if "Mr" is sometimes found as an
abbreviation for "Monsieur", its standard abbreviation is "M.", and its
plural "Messieurs" is noted "MM" without any abbreviation dot or
superscript, but normally never as
For the case of "Mister" vs. "Magister", the (double) underlining is not
just a stylistic option but conveys semantics as an explicit abbreviation
mark !
We are here at the line between what is pure visual encoding (e.g. using
superscript letters), and logical encoding (as done eveywhere else in
rk will
just render it at end of the sequence as a usual square or rectangular
"tofu"; those that recognize it as a combining character but no support for
it, will render the usual dotted square (meaning "unsupported combining
mark", to distinguish from the meaning as if t
Le dim. 28 oct. 2018 à 18:28, Janusz S. Bień a écrit :
> On Sun, Oct 28 2018 at 15:19 +0100, Philippe Verdy via Unicode wrote:
> > Given the "squiggle" below letters are actually gien distinctive
> > semantics, I think it should be encoded a combining character (to
Given the "squiggle" below letters are actually gien distinctive semantics,
I think it should be encoded a combining character (to be written not after
a "superscript" but after any normal base letter, possibly with other
combining characters, or CGJ if needed because of the compatibility
04:21, Garth Wallace via Unicode
a écrit :
> I learned that one as a kid, as the "pigpen cipher". I'm not aware of any
> numerological significance (which is easy enough to "find" in anything).
>
> On Sat, Oct 27, 2018 at 7:43 PM Philippe Verdy via Unicode <
>
(or Hebrew) letters to this alphabet varies (just
like Braille symbols depending on languages/scripts)
It may have extensions (like Braille outside its basic 2x3 patterns of
dots), such as a second dot in squares, horizontally as "··" or vertically
as ":"
Le dim. 28 oct. 2018 à
;"
> - "X" becomes approximately "\/"
> - "J" is noted like "I" as a square, or distinctly approximately as ">"
> with a central dot
>
> The 3x3 grid had some esoterical meaning based on numerology (a legend now
> propaged
rid had some esoterical meaning based on numerology (a legend now
propaged by scientology).
Le dim. 28 oct. 2018 à 02:59, Philippe Verdy a écrit :
> Do you speak about this one?
> https://www.magisterdaire.com/magister-symbol-black-sq/
> It looks like a graphic personal signature for the a
Do you speak about this one?
https://www.magisterdaire.com/magister-symbol-black-sq/
It looks like a graphic personal signature for the author of this esoteric
book, even if it looks like an interesting composition of several of our
existing Unicode symbols, glued together in a vertical ligature,
Le sam. 27 oct. 2018 à 15:06, Asmus Freytag via Unicode
a écrit :
> First question is: how do you interpret the symbol? For me it is
> definitely the capital M followed by the superscript "r" (written in an
> old style no longer used in Poland), but there is something below the
> superscript. It
g explicitly modified to suit an embedding protocol.
>
> And certainly the first sentence in this section isn’t intended to be
> taken without the context of the rest of the section.
>
>
>
> tex
>
>
>
>
>
>
>
> *From:* Philippe Verdy [mailto:verd...@wanadoo.fr]
frequently needed.
Le lun. 15 oct. 2018 à 13:57, Philippe Verdy a écrit :
> If you want an example where padding with "=" is not used at all,
> - look into URL-shortening schemes
> - look into database fields or data input forms and numerous data formats
> where the "=&q
e any
padding at all, letting the decoder discard the trailing bits themselves at
end of the encoded stream.
Le lun. 15 oct. 2018 à 13:24, Philippe Verdy a écrit :
> Also the rationale for supporting "unnecessary" whitespace is found in
> MIME's version of Base64, also
protocol.
>
>
>
> Tex
>
>
>
>
>
>
>
>
>
> *From:* Unicode [mailto:unicode-boun...@unicode.org] *On Behalf Of *Philippe
> Verdy via Unicode
> *Sent:* Sunday, October 14, 2018 1:41 AM
> *To:* Adam Borowski
> *Cc:* unicode Unicode Discussion
>
t treated as part of the encoding.
>
> Maybe we differ on define where the encoding begins and ends, and where
> higher level protocols prescribe how they are embedded within the protocol.
>
>
>
> Tex
>
>
>
>
>
>
>
>
>
> *From:* Unicode [mailto:unicode
Le dim. 14 oct. 2018 à 21:21, Doug Ewell via Unicode
a écrit :
> Steffen Nurpmeso wrote:
>
> > Base64 is defined in RFC 2045 (Multipurpose Internet Mail Extensions
> > (MIME) Part One: Format of Internet Message Bodies).
>
> Base64 is defined in RFC 4648, "The Base16, Base32, and Base64 Data
>
It's also interesting to look at https://tools.ietf.org/html/rfc3501
- which defines (for IMAP v4) another "BASE64" encoding,
- and also defines a "Modified UTF-7" encoding using it, deviating from
Unicode's definition of UTF-7,
- and adding other requirements (which forbids alternate encodings
e conforming to
Unicode, provided they preserve each Unicode scalar value, or at least the
code point identity because an encoder/decoder is not required to support
non-character code points such as surrogates or U+FFFE), where Base64 may
be used for internally generated octets-streams.
Le dim. 14
Le sam. 13 oct. 2018 à 18:58, Steffen Nurpmeso via Unicode <
unicode@unicode.org> a écrit :
> Philippe Verdy via Unicode wrote in w9+jearw4ghyk...@mail.gmail.com>:
> |You forget that Base64 (as used in MIME) does not follow these rules \
> |as it allows multiple d
on the
allowed set of characters, and on their maximum line lengths):
Base64_Encode[Base64_Decode[t]] = t may be false.
Le sam. 13 oct. 2018 à 16:45, Philippe Verdy a écrit :
> You forget that Base64 (as used in MIME) does not follow these rules as it
> allows multiple different encodings for the same
You forget that Base64 (as used in MIME) does not follow these rules as it
allows multiple different encodings for the same source binary. MIME
actually splits a binary object into multiple fragments at random
positions, and then encodes these fragments separately. Also MIME uses an
extension of
I also think the reverse is also true !
Decoding a Base64 entity does not warranty it will return valid text in any
known encoding. So Unicode normalization of the output cannot apply.
Even if it represents text, nothing indicates that the result will be
encoded with some Unicode encoding form
I see no easy way to convert ALL UPPERCASE text with consistant casing as
there's no rule, except by using dictionnary lookups.
In reality data should be input using default casing (as in dictionnary
entries), independantly of their position in sentences, paragraphs or
titles, and the contextual
Note: CLDR concentrates on keyboard layout for text input. Layouts for
other functions (such as copy-pasting, gaming controls) are completely
different (and not necessarily bound directly to layouts for text, as they
may also have their own dedicated physical keys or users can reprogram
their
. 16 sept. 2018 à 14:18, Marcel Schneider via Unicode <
unicode@unicode.org> a écrit :
> On 15/09/18 15:36, Philippe Verdy wrote:
> […]
> > So yes all control keys are potentially localisable to work best with
> the base layout anre remaining mnemonic;
> > but the physic
Le ven. 7 sept. 2018 à 05:43, Marcel Schneider via Unicode <
unicode@unicode.org> a écrit :
> On 07/09/18 02:32 Shriramana Sharma via Unicode wrote:
> >
> > Hello. This may be slightly OT for this list but I'm asking it here as
> it concerns computer usage with multiple scripts and i18n:
>
> It
No 0xF8..0xFF are not used at all in UTF-8; but U+00F8..U+00FF really
**do** have UTF-8 encodings (using two bytes).
The only safe way to represent arbitrary bytes within strings when they are
not valid UTF-8 is to use invalid UTF-8 sequences, i.e by using a
"UTF-8-like" private extension of
1 - 100 of 2449 matches
Mail list logo