t space.
It's just there because that was the most straightforward way to extend
GB 2312/GBK.
Regards, Martin.
On 20/03/2020 23:41, Adam Borowski via Unicode wrote:
> Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF or
> U+11000..U+7FFF (or possibly even up to 2³⁶ or 2⁴²), which has its uses
> but is not well-formed Unicode.
This would definitely no longer be UTF-8! Martin.
alien writing system can teach us
about human language”
Martin Neef, Professor, Institut für Germanistik, TU Braunschweig,
Braunschweig, Germany:
“What is it that ends with a full stop?”
***
Main topics of interest
***
We welcome original proposal
s cannot be allocated 2-1 or 1-2 to the two resulting
Hiragana. In a sophisticated implementation, a backspace could go from
"きゃ" to "ky", but that would only work immediately after input.
Of course, for Japanese input, Latin → Kana is only the first layer, the
second layer is Kana → Kanji.
Regards, Martin.
n confirm that a hard reload fixed the problem.
> BTW, if you want to comment on the format as opposed to glitches, please
> change the subject line.
I think it's less the format and much more the split personality of the
Unicode Web site(s?) that I have problems with.
Regards,
I had a look at the page with the frequencies. Many emoji didn't
display, but that's my browser's problem. What was worse was that the
sidebar and the stuff at the bottom was all looking weird. I hope this
can be fixed.
Regards, Martin.
Forwarded Message
Subject: The Most
On 2019/10/04 15:35, Martin J. Dürst via Unicode wrote:
> Hello Markus,
>
> On 2019/10/04 01:53, Markus Scherer via Unicode wrote:
>> Dear Unicoders,
>>
>> Is Manipuri/Meitei customarily written in Bangla/Bengali script or
>> in Meitei script?
>>
>>
https://www.atypi.org/conferences/tokyo-2019/programme/activity?a=906https://www.youtube.com/watch?v=S8XxVZkfUkk
It's a recent talk at ATypI in Tokyo (sponsored by Google, among others).
Regards, Martin.
amazon page... That (shell) in the title?
> Because it's saying "Haggadah shel Pesach", the Hebrew word "shel"
> meaning "of." The author's name? ♥♢♣♠ (or whatever the exact
> ordering is): "Martin Bodek", that is martini-glass, bow, and the
oth Chinese and Japanese glyph variants.
Regards, Martin.
hich is what I'm using here, I get hopelessly
stretched/squeezed glyph shapes, which definitely don't look good.
Regards, Martin.
of the people to realize that they are bad ideas. But that
doesn't make them any better when they turn up again.
Regards, Martin.
redo his presentation
because the vendor of his notebook's OS was in the process of changing
their emoji designs.
Regards,Martin.
On 2019/01/17 17:51, James Kass via Unicode wrote:
>
> On 2019-01-17 6:27 AM, Martin J. Dürst replied:
> > ...
> > Based by these data points, and knowing many of the people involved, my
> > description would be that decisions about what to encode as characters
&g
any legitimate
> reason why such information isn't worthy of being preservable in
> plain-text. Perhaps there isn't one.
See above.
> I'm not qualified to assess the impact of italic Unicode inclusion on
> the rich-text world as mentioned by David Starner. Maybe another list
> member will offer additional insight or a second opinion.
I'd definitely second David Starner on this point. The more options one
has to represent one and the same thing (italic styling in this thread),
the more complex and error-prone the technology gets.
Regards,Martin.
ad when it comes to styled
text. Do we want to encode background-color variant selectors in
Unicode? If yes, how many?
[Hint: The last two questions are rhetorical.]
Regards, Martin.
ses like "look, I found these
characters, aren't they cute" in some corners of some social services is
not the same as "we urgently need this, otherwise we can't communicate
in our language".
Regards,Martin.
Hello James, others,
On 2019/01/14 15:24, James Kass via Unicode wrote:
>
> Martin J. Dürst wrote,
>
> > I'd say it should be conservative. As the meaning of that word
> > (similar to others such as progressive and regressive) may be
> > interpreted in vari
Hello James, others,
From the examples below, it looks like a feature request for Twitter
(and/or Facebook). Blaming the problem on Unicode doesn't seem to be
appropriate.
Regards, Martin.
On 2019/01/14 18:06, James Kass via Unicode wrote:
>
> Not a twitter user, don't know how p
ing to abolish case
distinctions to adapt to computers, but fortunately, that wasn't necessary.
Regards, Martin.
Unicode to
work for a long, long time, it's very important to be conservative.
> I became attracted to Unicode about twenty years ago. Because Unicode
> opened up entire /realms/ of new vistas relating to what could be done
> with computer plain text. I hope this trend continues.
I hope this trend only continues very slowly, if at all.
Regards,Martin.
simulated by something else. And the simulation is highly limited, as
the voicing examples and the fact that the math alphanumerics only cover
basic Latin have shown.
Regards, Martin.
riants in rich
text scenarios such as HTML.
Regards,Martin.
impersonal usage, the meaning "it is advisable,
it is right to, it is proper to" seems to be most appropriate in this
context.
It may not at all be convenient (=practical) to use the superscripts,
e.g. if they are not easily available on a keyboard.
Regards, Martin.
(French isn't my native language, and nor is English)
ailing list.
> Making a safe distinction is beyond my knowledge, safest is not to
> discriminate.
Yes. The easiest way to not discriminate is to not use titles in mailing
list discussions. That's what everybody else does, and what I highly
recommend.
Regards,Martin.
n error.
The question of how to encode that dot is fortunately an easy one, but
even if it were not, German-writing people would find a sentence such as
"The dot or ... has no meaning at all." extremely weird. The dot is
there (and in German, has to be there) because it's an abbreviation.
Regards, Martin.
are exchanged, so that the piece that looks like @ is now
in the middle (it was at the left in (1) and (2)).
Hope this helps. Regards,Martin.
which is encoded as . Is the rendering I am getting
technically wrong, or is it merely undesirable?
The ambiguity arises in part because, like the Brahmi
e, that requires
additional processing.
Regards, Martin.
Ken, Markus,
Many thanks for your ideas, which I noted at
https://bugs.ruby-lang.org/issues/14839.
Regards, Martin.
On 2018/10/03 06:43, Ken Whistler wrote:
On 10/2/2018 12:45 AM, Martin J. Dürst via Unicode wrote:
My questions here are:
- Has this been considered when Georgian Mtavruli
particular the operation that's called 'capitalize' in Ruby?
Many thanks in advance for your input,
Regards, Martin.
row of the left
hand, and so on. For me, it was really terrible.
It may not be the same for everybody, but my experience suggests that it
may be similar for some others, and that therefore such a mapping should
only be voluntary, not default.
Regards, Martin.
otepad would add one single feature for each new
version of Windows. I think that was when the Save-As feature was added.
For a long time, I have set up Notepad++ to come up when Notepad is invoked.
Regards,Martin.
. We don't know how this will develop.
(Famous German (grammatically incorrect) saying:
Man gewöhnt sich an allem, auch am Dativ.)
I think you are best off writing Arzt/Ärztin.
Regards, Martin.
).
Regards,Martin.
*within* a grapheme cluster seems to be
a bad idea.
Regards, Martin.
ercase ß) is allowed, but not required.)
Regards, Martin.
bug report at https://bugs.ruby-lang.org/projects/ruby-trunk, I
should be able to follow up on that.
Regards, Martin.
it) change was okay. But
when talking about semantics, it's important to not only consider
surface semantics, but also the overall context.
Regards, Martin.
On 2018/04/20 18:12, Martin J. Dürst wrote:
There was an announcement for a public review period just recently. The
review period is up to the 23rd of April. I'm not sure whether the
announcement is up somewhere on the Web, but I'll forward it to you
directly.
Sorry, found the Web address
for a public review period just recently. The
review period is up to the 23rd of April. I'm not sure whether the
announcement is up somewhere on the Web, but I'll forward it to you
directly.
Regards, Martin.
of the correction.
I'm sure they know they exaggerated quite a bit. I'm also sure they
trust the Unicode Consortium to know when they would have to enlarge the
code space, if every.
Regards, Martin.
Please enjoy. Sorry for being late with forwarding, at least in some
parts of the world.
Regards, Martin.
Forwarded Message
Subject: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode
Date: Sun, 1 Apr 2018 08:29:00 -0700 (PDT)
From: rfc-edi...@rfc-editor.org
Reply
On 2018/03/09 21:24, Mark Davis ☕️ wrote:
There are definitely many dialects across Switzerland. I think that for
*this* phrase it would be roughly the same for most of the population, with
minor differences (eg 'het' vs 'hät'). But a native speaker like Martin
would be able to say for sure
olutionary/historic linguistic perspective.
[Disclaimer: I'm not a linguist.]
Regards, Martin.
distinguishability is high, the same may not apply across fonts
(e.g. if one has to compare a printed version with a version on-screen).
Regards, Martin.
2018-03-11 6:04 GMT+01:00 Keith Turner via Unicode <unicode@unicode.org>:
I created a neat little project based on Unicode emojis. I t
extremely unusual. For Korea, these days, it will be mostly Hangul; I'm
not sure whether addresses with Hanja would incur a delay. My guess
would be that Bopomofo wouldn't work in mainland China (might work in
Taiwan, not sure).
Regards, Martin.
themselves.
Apart from that, at least in Japan, signatures are used extremely
rarely; it's mostly stamped seals, which are also kept as images by
banks,...
Regards, Martin.
aims, it's difficult to falsify many of
them. It would be easier to prove them (assuming they were true), so if
you have any supporting evidence, please provide it.
Regards, Martin.
John Knightley
digitally disadvantaged) scripts. See e.g. the recent announcement
at
http://blog.unicode.org/2018/02/adopt-character-grant-to-support-three.html.
Regards, Martin.
different orthographies, but that's really a very minor issue
when learning one language from the other even though these languages
are very close.
Regards, Martin.
characters overnight.
Yes indeed.
Regards, Martin.
gn up to participate in the
original I-mode (first case of Web on mobile phones) service. Of course,
that specific emoji (or was it several) wasn't encoded in Unicode
because of trademark issues.
Regards,Martin.
sometimes the accents on
upper-case letters are left out, but I haven't heard of a reverse
phenomenon yet.
Regards, Martin.
use w and x, so they could use one of these. But
personally, I'd find accents more visually pleasing.
Regards, Martin.
memory, i.e. it is done without
thinking about it. I would guess that would be very difficult to
maintain two different kinds of muscle memory for typing Malayalam. (My
assumption is that the populations typing traditional and reformed
writing styles are not disjoint.)
Regards, Martin.
in terms of
offering something for editorial convenience while being easy to implement.
Regards, Martin.
A friend of mine sent me a pointer to
http://nullprogram.com/blog/2017/10/06/, a branchless UTF-8 decoder.
Regards, Martin.
character, is another question. See
http://www.bitsavers.org/pdf/ibm/1620/A26-5706-3_IBM_1620_CPU_Model_1_Jul65.pdf
What page?
Regards, Martin.
to happen is to have this discussion in Assamese
rather than in English, because then people eventually will see that
there's no problem.
Regards,Martin.
However, 'popular nationalism' will probably be used to attack Unicode then.
David Faulks
? Any idea what might have caused this?
Regards, Martin.
Hello Mark,
On 2017/08/04 09:34, Mark Davis ☕️ wrote:
FYI, the UTC retracted the following.
Thanks for letting us know!
Regards, Martin.
*[151-C19 <http://www.unicode.org/cgi-bin/GetL2Ref.pl?151-C19>]
Consensus:* Modify
the section on "Best Practices for Using FFFD"
-8 to four bytes, but would almost double the
code space. Assuming (conservatively) that it will take about a century
to fill up all 17 (well, actually 15, because two are private) planes,
this would give us another century.
Just one more crazy idea :-(.
Regards, Martin.
identified recommendation, so
that Python3, Ruby, Web standards and browsers, and so on can easily
refer to it.
Regards, Martin.
I believe this is pretty much in line with Shawn's position. Certainly,
a discussion of the reasons one might choose one interpretation over
another should be
Hello Markus, others,
On 2017/05/27 00:41, Markus Scherer wrote:
On Fri, May 26, 2017 at 3:28 AM, Martin J. Dürst <due...@it.aoyama.ac.jp>
wrote:
But there's plenty in the text that makes it absolutely clear that some
things cannot be included. In particular, it says
The term “m
On 2017/05/25 09:22, Markus Scherer wrote:
On Wed, May 24, 2017 at 3:56 PM, Karl Williamson <pub...@khwilliamson.com>
wrote:
On 05/24/2017 12:46 AM, Martin J. Dürst wrote:
That's wrong. There was a public review issue with various options and
with feedback, and the recommendation ha
rogramming language and browsers) without
problems for quite some time.
There is no proposal to add a
recommendation "this late in the game".
True. The proposal isn't for an addition, it's for a change. The "late
in the game" however, still applies.
Regards, Martin.
stricter requirement for alignment, and some have followed longstanding
recommendations in the absence of specific arguments for something
different.
Regards, Martin.
- And still can proposal that — as I said, there is plenty of time.
Mark
On Wed, May 17, 2017 at 10:41 PM, Doug Ewell v
don't spend too much time on
it.] I find it particularly strange that at a time when UTF-8 is firmly
defined as up to 4 bytes, never including any bytes above 0xF4, the
Unicode consortium would want to consider recommending that 84 85> be converted to a single U+FFFD. I note with agreement that
Markus seems to have thoughts in the same direction, because the
proposal (17168-utf-8-recommend.pdf) says "(I suppose that lead bytes
above F4 could be somewhat debatable.)".
Regards,Martin.
ear (close to)
square. However, because diagrams are usually viewed at close to a right
angle, Go diagrams use squares, not rectangles.
Regards, Martin.
Hello Janusz,
I think you should report this problem to
http://www.unicode.org/reporting.html. That way, it gets tracked
appropriately. This list is for discussion, not for bug fixes.
Regards, Martin.
On 2017/04/10 18:54, Janusz S. Bień wrote:
This is a long overdue issue, but better
Hello Michael,
[I started to write this mail quite some time ago. I decided to try to
let things cool down a bit by waiting a day or two, but it has become
more than a week now.]
On 2017/03/29 22:08, Michael Everson wrote:
Martin,
It’s as though you’d not participated in this work for many
that it's
the other way round.
Regards, Martin.
bic/Hebrew/... document), the bidi context will default
to left-to right...
There never was a "bidi" attribute in HTML. You probably mean the "dir"
attribute.
Regards, Martin.
The uniform width is a key part of the
semantic of the seqeunces being discussed.
The full width/half width distinction mostly is a legacy (roundtrip) issue.
Regards, Martin.
master/third_party/region-flags>
The last one currently already has support for UK countries, US states and
Canadian provinces. Go figure.
And most if not all of these flags are from Wikimedia. So that shows
that open source has some influence, even without money.
Regards, Martin.
nd deciding to split because history is
way more important than modern practice.
In that light, some more comments lower down.
On 2017/03/28 22:56, Michael Everson wrote:
On 28 Mar 2017, at 11:39, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote:
An æ ligature is a ligature of a and
ji too !
I prefer soft pretzels!
Regards, Martin.
if slowly and to some
extent quite reluctantly. It's anyone's bet in what time frame and order
e.g. the flags of California and Texas will be 'recommended'. But I have
personally no doubt that these (and quite a few others) will eventually
make it, even if I have mixed feelings about that.
Regards, Martin.
On 2017/03/27 21:59, Michael Everson wrote:
On 27 Mar 2017, at 08:05, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote:
Consider 2EBC ⺼ CJK RADICAL MEAT and 2E9D ⺝ CJK RADICAL MOON which are
apparently really supposed to have identical glyphs, though we use an
old-fashioned style in the
how that these
variants existed (which I think nobody in this discussion has doubted),
but not that there was contrasting use. And is that letter hand-written
or printed?
Regards,Martin.
, but it may also
be possible to just tag each piece of text in the database with "1855"
or "1859" if that distinction is important (e.g. for historical
documents). As far as I understand, we are still looking for actual
texts that use both shapes of the same ligature concurrently.
Regards, Martin.
I agree with Alstair.
The list of font technology options was mostly to show that there are
already a lot of options (some might even say too many), so font
technology doesn't really limit our choices.
Regards, Martin.
On 2017/03/27 23:04, Alastair Houghton wrote:
On 27 Mar 2017, at 10
Hello Michael, others,
On 2017/03/27 21:07, Michael Everson wrote:
On 27 Mar 2017, at 06:42, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote:
The characters in question have different and undisputed origins, undisputed.
If you change that to the somewhat more neutral "the shapes
*concurrent* existence of
*corresponding* ligatures in the same font, or the concurrent (even
better, contrasting) use of corresponding ligatures in the same text.
Regards, Martin.
What's interesting (weird?) is that the "1859" OI <ЃІ> appears in 1857
punches. Time t
On 2017/03/24 23:37, Michael Everson wrote:
On 24 Mar 2017, at 11:34, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote:
On 2017/03/23 22:48, Michael Everson wrote:
Indeed I would say to John Jenkins and Ken Beesley that the richness of the
history of the Deseret alphabet would be impove
)
And then the same questions, with parallel (or not parallel) answers,
for ɒɪ/ɔɪ/Ц.
Regards,Martin.
Text copied from earlier mail by Michael:
>>>>
1. The 1855 glyph for Ч EW is evidently a ligature of the glyph for the
diagonal stroke of the glyph for І SHORT I [ɪ] and Ѕ LONG
On 2017/03/26 22:15, Michael Everson wrote:
On 26 Mar 2017, at 09:12, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote:
Thats a good point: any disunification requires showing examples of
contrasting uses.
Fully agreed.
The default position is NOT “everything is encoded unified
have been just fine with unifying diaeresis and
umlaut. (German fonts e.g. may have contained a 'ë' for use e.g. with
"Citroën", but the dots on that 'ë' will have been the same shape as
'ä', 'ö', and 'ü' umlauts for design consistency, and the other way
round for French).
Regards, Martin.
ot;ae" and the letter "ä" which are orthographic variants not distinguished
by the language but by authors' preference.
Well, in most cases, but not e.g. for names. Goethe is not spelled Göthe.
Regards, Martin.
ite deeply for some of the sources.
Regards, Martin.
e.g. with different fonts the same way we have thousands of
different fonts for Latin and many other scripts that show a lot of rich
history.
Regards, Martin.
users.
What is right for Deseret has to be decided by and for Deseret users,
rather than by script historians.
Regards, Martin.
The glyphs may come from a different origin, but it's encoding the same idea.
We don’t encode diphthongs. We encode the elements of writing systems. The
“idea
my mail in plain text, but it still worked.
Regards, Martin.
domain anyway, pict.com seems now defunct.
Isn't WAP overall pretty much defunct these days?
(Well, many including me predicted as much pretty much when it first
showed up.)
Regards, Martin.
character has Bidi
property EN, not L, R or AL.
On first sight, it looks to me as if you're correct.
For the exact interpretation of RFC 5893, you'd better write to the
mailing list of the former IDNA(bis) WG at idna-upd...@alvestrand.no.
Regards, Martin.
Similarly (line 93)
B;àˇ
com> on behalf of Philippe Verdy <verd...@wanadoo.fr>
Reply-To: Philippe Verdy <verd...@wanadoo.fr>
Date: Friday, December 23, 2016 at 1:35 PM
To: Martin Mueller <martinmuel...@northwestern.edu>
Cc: William_J_G Overington <wjgo_10...@btinternet.com>, "unicode@unicode.
.
From: Leo Broukhis <leo...@gmail.com>
Reply-To: "l...@mailcom.com" <l...@mailcom.com>
Date: Thursday, December 22, 2016 at 6:31 PM
To: Martin Mueller <martinmuel...@northwestern.edu>
Cc: unicode Unicode Discussion <unicode@unicode.org>
Subject: Re: a character for
be persuaded otherwise.
With thanks for the help of all of you
MM
On 12/22/16, 6:03 AM, "William_J_G Overington" <wjgo_10...@btinternet.com>
wrote:
Martin Mueller wrote:
> Is there a Unicode character that says “I represent an alphanumerical
character, but
. And if that isn’t the case, the transcriber wouldn’t know
it. S/he sees that there is something, perhaps even that there is just one of
it, but doesn’t know which
Martin Mueller
Professor emeritus of English and Classics
Northwestern University
ou'll want to do something different than if you are
sending the text to a printer just to have a look at it.
Regards, Martin.
1 - 100 of 358 matches
Mail list logo