, and
the LATIN CAPITAL LETTER RA
LATIN CAPITAL LETTER RA? Shouldn't that be LATIN CAPITAL LETTER R?
Regards,Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp
) are used both as letters and as decimal
place-value digits, and they are scattered widely, and of course there
are is a lot of modern living practice.
Regards,Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp
be 460, and
560 would be 五佰六十 :-).
Regards, Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp
On 2010/07/29 13:33, karl williamson wrote:
Asmus Freytag wrote:
On 7/25/2010 6:05 PM, Martin J. Dürst wrote:
Well, there actually is such a script, namely Han. The digits (一、
二、三、四、五、六、七、八、九、〇) are used both as letters and as
decimal place-value digits, and they are scattered widely
,Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp
necessary when talking
*about* these characters (meta-level) rather than when just using them
(non-meta), then I would indeed agree that there is no reason to encode
them separately.
Regards,Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp
Mono much,
I'm not even sure whether I ever used it, but at the time I found the
idea that somebody was working on a font that covered Unicode really
worthy of support.
Regards,Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp
,Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp
to be fully deployed. Please see http://www.w3.org/Fonts/
for more details and pointers.
Regards,Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp
on Unicode
strings (which, for many good reasons, were ultimately rejected), please
see the discussion around
http://lists.w3.org/Archives/Public/public-iri/2009Sep/0064.html.
Regards,Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:due
, so as in the (somewhat distant) future to allow
for cases where a name with 'ß' and a name with 'ss' are resolved
differently.
Regards, Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp
and identifies broken surrogate pairs and illegal
characters? Ideally, the utility can both report illegal code units and repair
them by replacing them with U+FFFD.
Jim Monty
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:due
.
For some processing this is true, but it's rather short-sighted.
Regards,Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp
would you use Ruby for conversion when programming in Perl?
You could just as well program in Ruby, it's much more fun!
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp
FYI. Regards, Martin.
Original Message
Subject: RFC 6082 on Deprecating Unicode Language Tag Characters: RFC
2482 is Historic
Date: Sun, 7 Nov 2010 21:50:44 -0800 (PST)
From: rfc-edi...@rfc-editor.org
To: ietf-annou...@ietf.org, rfc-d...@rfc-editor.org
CC:
is a
sub-encoding of windows-1252 if the former is interpreted as not
including the C1 range.
Regards, Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp
On 2011/07/15 18:51, Michael Everson wrote:
On 15 Jul 2011, at 09:47, Andrew West wrote:
If you want a font to display a visible glyph for a format or space character
then you should just map the glyph to its character in the font, as many fonts
already do for certain format characters.
Hello Mark, others,
On 2011/07/28 5:01, Mark Davis ☕ wrote:
Just to remind people: posting to this list does *not* mean submitting to
the UTC. If you want to discuss a proposal here, not a problem, but just
remember that if you want any action you have to submit to the UTC.
Unicode members
On 2011/09/10 9:32, Stephan Stiller wrote:
Actually, I *was* talking about purely typographic/aesthetic ligatures
as well. I'm aware that which di-/trigraphs need to be considered from a
font design perspective is language-dependent.
And this language-dependence is not only a question of
Hello Delex,
On 2011/09/14 15:55, delex r wrote:
The “Dark age of Assamese language” ran for about 37 years in this region when
it was tried to kill a the language by vested interests with the help of
British Political powers imposing Bengali as medium of instruction in school
and
[By accident, I sent this only to Ken first; he recommended I send it to
both Unicode and Unicore.]
I have sent a mail to a relevant IETF list (apps-disc...@ietf.org); the
IETF was looking into taking this over, with
http://tools.ietf.org/html/draft-lear-iana-timezone-database-04, but
,Martin.
On 2011/10/07 14:14, Martin J. Dürst wrote:
[By accident, I sent this only to Ken first; he recommended I send it to
both Unicode and Unicore.]
I have sent a mail to a relevant IETF list (apps-disc...@ietf.org); the
IETF was looking into taking this over, with
http://tools.ietf.org/html
On 2011/10/10 21:10, Eli Zaretskii wrote:
Date: Mon, 10 Oct 2011 17:47:21 +0800
From: li bolibo@gmail.com
From section 3:
Paragraphs are divided by the Paragraph Separator or appropriate
Newline Function (for guidelines on the handling of CR, LF, and CRLF,
see Section 4.4,
On 2011/10/11 7:35, Philippe Verdy wrote:
I've seen various interpretations, but the ASCII solidus is
unambiguously used with a strong left-to-right associativity, and the
same occurs in classical mathematics notations (the horizontal bar is
another notation but even where it is used, it also
On 2011/10/11 10:29, Martin J. Dürst wrote:
On 2011/10/10 21:10, Eli Zaretskii wrote:
Date: Mon, 10 Oct 2011 17:47:21 +0800
In addition to the Paragraph Separator, _any_ newline function (LF,
CR+LF, CR, or NEL) can end a paragraph. Also U+2028, the LS
character. See section 5.8
On 2011/10/11 13:07, Eli Zaretskii wrote:
Date: Tue, 11 Oct 2011 10:53:39 +0900
From: Martin J. Dürstdue...@it.aoyama.ac.jp
CC: li bolibo@gmail.com, unicode@unicode.org
This is different from what you did in Emacs, which I'd call
line-folding, i.e. cut the line after a paragraph is laid out
Hello Eli,
There is absolutely no problem to treat the algorithm in UAX#9 as a set
of requirements, and come up with a totally different implementation
that produces the same results. I think actually UAX#9 says so somewhere.
But what is, strictly speaking, not allowed is to change the
Hello Kent,
I was also very much thinking that mirrored glyph should be of the same
width, but there might be subtle issues when you consider kerning. As a
very basic example, think about kerning of the pair K), and then think
about K(.
Regards, Martin.
On 2011/10/11 19:39, Kent Karlsson
I'm hoping to get some advice from people with experience with various
Unicode/transcoding libraries.
RFC 3987 (the current IRI spec) has the following text:
Note: Some older software transcoding to UTF-8 may produce illegal
output for some input, in particular for characters outside
How can one use the Forum to comment on URI/IRI issues when one gets a
message:
Your message contains too many URLs. The maximum number of URLs allowed
is 8.
I never liked this forum stuff too much, and this hasn't made things
better :-(.
Regards, Martin.
I tried to find something like a normative description of the default
bidi class of unassigned code points.
In UTR #9, it says
(http://www.unicode.org/reports/tr9/tr9-23.html#Bidirectional_Character_Types):
Unassigned characters are given strong types in the algorithm. This is
an explicit
On 2011/11/21 5:54, Asmus Freytag wrote:
On 11/20/2011 8:00 AM, Joó Ádám wrote:
Leaving aside that CSS is presentation and not content, and is
definitely not markup. HTML is a better candidate.
Á
The details of the appearance of the mark would be presentation.
The scoping, like for applying
On 2012/04/28 4:26, Mark Davis ☕ wrote:
Actually, if the goal is to get as many characters in as possible, Punycode
might be the best solution. That is the encoding used for internationalized
domains. In that form, it uses a smaller number of bytes per character, but
a parameterization allows
On 2012/04/28 7:29, Cristian Secară wrote:
În data de Fri, 27 Apr 2012 12:26:25 -0700, Mark Davis ☕ a scris:
Actually, if the goal is to get as many characters in as possible,
Punycode might be the best solution. That is the encoding used for
internationalized domains. In that form, it uses a
On 2012/04/27 17:06, Cristian Secară wrote:
It turned out that they (ETSI its groups) created a way to solve the
70 characters limitation, namely “National Language Single Shift” and
“National Language Locking Shift” mechanism. This is described in 3GPP
TS 23.038 standard and it was introduced
On 2012/04/29 18:58, Szelp, A. Sz. wrote:
While there are good reasons the authors of HTML5 brought to ignore SCSU or
BOCU-1, having excluded UTF-32 which is the most direct, one-to-one mapping
of Unicode codepoints to byte values seems shortsighted.
Well, except that it's hopelessly
On 2012/05/29 17:43, Asmus Freytag wrote:
On 5/27/2012 5:52 PM, Michael Everson wrote:
Get over it. Please just get over it. It doesn't matter. It's a blort.
Time to agree with Michael.
Get over it, is good advice here.
Sovereign countries are free to decree currency symbols, whatever their
On 2012/05/30 4:42, Roozbeh Pournader wrote:
Just look what happened when the Japanese did their own font/character set
hack. The backslash/yen problem is still with us, to this day...
To be fair, the Japanese Yen at 0x5C was there long before Unicode, in
the Japanese version of ISO 646.
On 2012/07/11 4:37, Asmus Freytag wrote:
I recall, with certainty, having seen the : in the context of
elementary instruction in arithmetic,
as in 4 : 2 = ?, but am no longer positive about seeing ÷ in the same
context.
I remember this very well. In grade school, we had to learn two ways to
On 2012/07/11 10:35, Stephan Stiller wrote:
About Martin Dürst's content re geteilt-gemessen:
When I attended the German school system in approx the 1990s this
distinction wasn't mentioned or taught. (I prefer to not give details
about specific time and place for privacy reasons.)
Sorry, but
On 2012/07/11 11:04, Mark E. Shoulson wrote:
Ever start to feel that we would have been better off not to give
official descriptive names at all? Or else really vague ones like
LETTERLIKE THINGY NUMBER 5412? So much blood-pressure raised over the
names...
I'm feeling that way since about the
On 2012/07/13 0:12, Leif Halvard Silli wrote:
Doug Ewell, Wed, 11 Jul 2012 09:12:46 -0600:
and people who want to create or modify UTF-8 files which will
be consumed by a process that is intolerant of the signature
should not use Notepad. That goes for HTML (pre-5) pages [snip]
HTML5-parsers
On 2012/07/13 22:31, Jukka K. Korpela wrote:
2012-07-13 16:12, Leif Halvard Silli wrote:
The kind of BOM intolerance I know about in user agents is that some
text browsers and IE5 for Mac (abandoned) convert the BOM into a
(typically empty) line a the start of the body element.
I wonder if
On 2012/07/14 1:33, Philippe Verdy wrote:
Fra: Jukka K. Korpelajkorp...@cs.tut.fi
When the BOM is used in web pages or editors for UTF-8 encoded content it
can sometimes introduce blank spaces or short sequences of strange-looking
characters (such as ). For this reason, it is usually best
On 2012/07/17 17:22, Leif Halvard Silli wrote:
And an argument was put forward in the WHATWG mailinglist
earlier tis year/end of previous year, that a page with strict ASCII
characters inside could still contain character entities/references for
characters outside ASCII.
Of course they can.
Hello Leif,
Sorry to be late with my answer.
On 2012/07/13 20:44, Leif Halvard Silli wrote:
Martin J. Dürst, Fri, 13 Jul 2012 18:17:05 +0900:
On 2012/07/13 0:12, Leif Halvard Silli wrote:
Doug Ewell, Wed, 11 Jul 2012 09:12:46 -0600:
and people who want to create or modify UTF-8 files which
Hello Leif,
On 2012/07/18 4:35, Leif Halvard Silli wrote:
But is the Windows Notepad really to blame?
Pretty much so. There may have been other products from Microsoft that
also did it, but with respect to forcing browsers and XML parsers to
accept an UTF-8 BOM as a signature, Notepad was
Hello Philippe,
On 2012/07/18 3:37, Philippe Verdy wrote:
2012/7/17 Julian Bradfieldjcb+unic...@inf.ed.ac.uk:
On 2012-07-16, Philippe Verdyverd...@wanadoo.fr wrote:
I am also convinced that even Shell interpreters on Linux/Unix should
recognize and accept the leading BOM before the hash/bang
Hello Jukka,
On 2012/07/17 23:31, Jukka K. Korpela wrote:
2012-07-17 17:11, Leif Halvard Silli wrote:
For instance, early on in 'the Web', some
appeared to think that all non-ASCII had to be represented as entities.
Yes indeed. There's still some such stuff around. It's mostly
unnecessary,
Hello Doug,
On 2012/07/18 0:35, Doug Ewell wrote:
For those who haven't yet had enough of this debate yet, here's a link
to an informative blog (with some informative comments) from Michael
Kaplan:
Every character has a story #4: U+feff (alternate title: UTF-8 is the
BOM, dude!)
On 2012/07/18 16:35, Leif Halvard Silli wrote:
Martin J. Dürst, Wed, 18 Jul 2012 11:00:42 +0900:
The best reason is simply that nobody should be using
crutches as long as they can walk with their own legs.
Crutches, in that sense, is only about authoring convenience. And, of
course
Hello Leif,
I think that more and more, we are on the wrong mailing list.
Regards, Martin.
On 2012/07/18 18:47, Leif Halvard Silli wrote:
Martin J. Dürst, Wed, 18 Jul 2012 17:20:31 +0900:
On 2012/07/18 16:35, Leif Halvard Silli wrote:
Martin J. Dürst, Wed, 18 Jul 2012 11:00:42 +0900
On 2012/07/21 7:01, David Starner wrote:
I'm concerned about the statement/implication that one can optimize
for ASCII and Latin-1. It's too easy for a lot of developers to test
speed with the English/European documents they have around and test
correctness only with Chinese. I see the argument
Hello Karl,
On 2012/07/21 0:41, Karl Pentzlin wrote:
Looking for an example of plain text which is obvious to anybody,
it seems to me that the Subject field of e-mails is a good example.
Common e-mail software lets you enter any text but gives you never
access to any higher-level protocol.
Richard - Complex script usually refers to scripts where rendering isn't
just simply putting glyphs side by side. That includes stuff with
combining marks, ligatures, reordering, stacking, and the like.
Regards, Martin.
On 2012/10/03 7:09, Richard Wordingham wrote:
On Tue, 02 Oct 2012
So in order to get something going here, why doesn't Doug draft a letter
to these guys (possibly based on the one from a few years ago) and then
Mark sends it off in his position at Unicode, which hopefully will
impress them more than just a personal contribution.
Being upset in this list
On 2012/11/08 19:15, Michael Everson wrote:
On 8 Nov 2012, at 09:59, Simon Montagusmont...@smontagu.org wrote:
Please take into account that the half-stars should be symmetric-swapped in RTL
text. I attach an example from an advertisment for a movie published in Haaretz
2 November 2012
I
On 2012/11/13 21:49, Eli Zaretskii wrote:
I'd welcome that. Although the reality flies in the face of user
requirements in this case: most bidi-aware editors, including my own
work in Emacs, don't have 2 carets, for some reason. Maybe the
developers didn't consider that important enough, or
Just in case it helps, Ruby (since version 1.9) also uses 3).
Regards, Martin.
On 2012/11/17 6:48, Buck Golemon wrote:
When decoding bytes to unicode using the latin1 scheme, there are three
options for bytes not defined in the ISO-8859-1 standard.
1) Throw an error.
2) Insert the
On 2012/11/17 9:45, Doug Ewell wrote:
If he is targeting HTML5, then none of this matters, because HTML5 says
that ISO 8859-1 is really Windows-1252.
Yes. But unless Python wants to limit its use to HTML5, this should be
handled on a separate level (mapping a iso-8859-1 label to the
On 2012/11/17 9:56, Philippe Verdy wrote:
True. HTML5 makes its own reinterpretation of the IETF's MIME standard,
definining it own protocol (which means that it is no longer fully
compatible with MIME and its IANA datatabase, because the mapping of the
value of a charset= pseudo-attribute is
On 2012/11/21 16:23, Peter Krefting wrote:
Doug Ewell d...@ewellic.org:
Somewhat off-topic, I find it amusing that tolerance of poorly
encoded input is considered justification for changing the underlying
standards,
The encoding work at W3C, at least as far as I see it, is not an attempt
to
Well, first, it is 17 planes (or have we switched to using hexadecimal
numbers on the Unicode list already?
Second, of course this is in connection with UTF-16. I wasn't involved
when UTF-16 was created, but it must have become clear that 2^16 (^
denotes exponentiation (to the power of))
On 2012/11/17 12:54, Buck Golemon wrote:
On Fri, Nov 16, 2012 at 4:11 PM, Doug Ewelld...@ewellic.org wrote:
Buck Golemon wrote:
Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and
to map it to the equally-non-semantic U+81 ?
U+0081 (there are always at least four
To this, my mother would say: Why keep it simple when we can make it
complicated?.
Regards,Martin.
On 2012/11/27 21:01, Philippe Verdy wrote:
That's a valid computation if the extension was limited to use only
2-surrogate encodings for supplementary planes.
If we could use 3-surrogate
I'm looking for a (preferably online) tool that converts Unicode
characters to Unicode character names. Richard Ishida's tools
(http://rishida.net/tools/conversion/) do a lot of conversions, but not
names.
Regards, Martin.
On 2012/12/21 0:59, Asmus Freytag wrote:
There have been efforts at a Japanese translation of the text of the
standard, I have no idea whether that contains translated names for
characters.
JIS X 0221-1995, which is a translation of ISO 10646, contains some
Japanese character names, but this
On 2013/01/06 7:21, Costello, Roger L. wrote:
Does this mean that when exchanging Unicode data across the Internet the
endianness is not relevant?
Are these stated correctly:
When Unicode data is in a file we would say, for example, The file contains
UTF-32BE data.
When Unicode
On 2013/01/08 3:27, Markus Scherer wrote:
Also, we commonly read code points from 16-bit Unicode strings, and
unpaired surrogates are returned as themselves and treated as such (e.g.,
in collation). That would not be well-formed UTF-16, but it's generally
harmless in text processing.
Things
On 2013/01/08 14:43, Stephan Stiller wrote:
Wouldn't the clean way be to ensure valid strings (only) when they're
built
Of course, the earlier erroneous data gets caught, the better. The
problem is that error checking is expensive, both in lines of code and
in execution time (I think there
On 2013/01/22 1:12, Denis Jacquerye wrote:
Does anybody have any idea of how much of the Web is normalized in NFC
or NFD? Or how much not normalized?
I have never measured this. But at one time, there was only NFD (and
NFKD). The Unicode Consortium, with input from W3C, then defined NFC
(and
Hello Roger,
The conclusion to your question below is a very clear NO. The reason is
that most text is already in NFC. In fact, as I wrote a few days or
weeks ago, NFC was defined to capture what's usually around on the Web
(and in other places, too). Trying to recommend that everything be in
On 2013/04/11 16:30, Michael Everson wrote:
On 11 Apr 2013, at 00:09, Shriramana Sharmasamj...@gmail.com wrote:
Or was the Khmer model of an invisible joiner a *later* bright idea?
Yes.
Later, yes. Bright? Most Kambodian experts disagree.
Regards, Martin.
On 2013/04/23 18:01, William_J_G Overington wrote:
On Monday 22 April 2013, Asmus Freytagasm...@ix.netcom.com wrote:
I'm always suspicious if someone wants to discuss scope of the standard before
demonstrating a compelling case on the merits of wide-spread actual use.
The reason that I
On 2013/06/22 0:32, Michael Everson wrote:
On 21 Jun 2013, at 16:20, Khaled Hosnykhaledho...@eglug.org wrote:
Yeah, I don't believe that you can language-tag individual file names for such
display as that is markup.
Why do you need to? You only need one language, it is not like file names
On 2013/07/05 16:04, Denis Jacquerye wrote:
On Thu, Jul 4, 2013 at 12:07 PM, Michael Eversonever...@evertype.com wrote:
The problem is in pretending that a cedilla and a comma below are equivalent
because in some script fonts in France or Turkey routinely write some sort of
On 2013/07/05 17:25, Stephan Stiller wrote:
What I had in mind was more specific: Germans are supposed to convert
[ä,ö,ü,ß] to [ae,oe,ue,ss], though I don't know what's considered
best/legal wrt documents required for entering the US, for example.
I have always used Duerst on plane tickets
On 2013/10/02 9:52, Leo Broukhis wrote:
Thanks! That comes out exactly right, although using math markup for
linguistic purposes is, IMO, a stretch.
Why? Surely like in other fields (Math to start with), there somewhere
is a boundary between plain text and rich text. Of course it's not
On 2013/10/23 4:22, Asmus Freytag wrote:
On 10/22/2013 11:38 AM, Jean-François Colson wrote:
Hello.
I know that in some Japanese encodings (JIS, EUC), \ was replaced by a ¥.
On my computer, there are some Japanese fonts where the characters
seems coded following Unicode, except for the \
Hello Henry,
Some comments on your specific questions, which may trigger some
additional discussion.
On 2013/12/12 1:43, Henry S. Thompson wrote:
I'm one of the editors of a proposed replacement for RFC3023 [1], the
media type registration for application/xml, text/xml and 3 others.
The
J. Dürst due...@it.aoyama.ac.jp
On 2014/03/16 14:36, Philippe Verdy wrote:
You may still want to promote it at some government or education
institution, in order to promote it as a national standard, except that
there's little change it will ever happen when all countries in ISO have
stopoed
I got informed today by your IT Dept. that the mail below never went
out. Resent herewith.Martin.
Original Message
Subject: Re: Romanized Singhala got great reception in Sri Lanka
Date: Mon, 17 Mar 2014 14:37:00 +0900
From: Martin J. Dürst due...@it.aoyama.ac.jp
On 2014
Now that it's no longer April 1st (at least not here in Japan), I can
add a (moderately) serious comment.
On 2014/04/02 01:43, Ilya Zakharevich wrote:
On Tue, Apr 01, 2014 at 09:01:39AM +0200, Mark Davis ☕️ wrote:
More emoji from Chrome:
On 2014/04/02 20:08, Christopher Fynn wrote:
On 02/04/2014, Asmus Freytag asm...@ix.netcom.com wrote:
On 4/2/2014 1:42 AM, Christopher Fynn wrote:
Rather than Emoji it might be better if people learnt Han ideographs
which are also compact (and a far more developed system of
communication than
On 2014/04/03 02:00, James Lin wrote:
Emoji or 顔文字, literally means Face word or Face Characters, essentially,
Emoji is 絵文字 (picture character), 顔文字 is kaomoji (face character).
Regards, Martin.
provides an emotional state in the context of words. Emoji is very
popular in APJ, and
On 2014/06/03 07:08, Asmus Freytag wrote:
On 6/2/2014 2:53 PM, Markus Scherer wrote:
On Mon, Jun 2, 2014 at 1:32 PM, David Starner prosfil...@gmail.com
mailto:prosfil...@gmail.com wrote:
I would especially discourage any web browser from handling
these; they're noncharacters used for
On 2014/07/24 15:37, Richard Wordingham wrote:
No. The text samples I could find quickly show scripta continua, but I
suspect the line breaks are occurring at word or syllable boundaries.
If I am right about the constraint on line break position, then this
can be recovered by marking the
On 2014/10/24 10:21, Asmus Freytag wrote:
Peter is correct.
The only fonts that should be released to the public are those that are
Unicode encoded and have the correct shaping tables.
Unlike the public, the code chart editors for Unicode have tools that
can correctly handle not only
On 2014/12/18 06:49, Michael Everson wrote:
Clearly the plural of emoji is emojis.
Not in Japanese, where there are no plural forms. The question of what
it is/will be in English will be decided by usage, not by grammar. I'd
use 'emoji', but then I'm too biased towards Japanese to be
On 2014/12/24 09:50, Tex Texin wrote:
True, however as William points out, apparently the rules have changed,
I hope the rules get clarified to clearly state that these are exceptions.
so it isn’t unreasonable to ask again whether the rules now allow it, or if
people that dismissed the idea
On 2015/02/20 05:17, Eli Zaretskii wrote:
From: Philippe Verdy verd...@wanadoo.fr
Date: Thu, 19 Feb 2015 20:31:07 +0100
Cc: Julian Bradfield jcb+unic...@inf.ed.ac.uk,
unicode Unicode Discussion unicode@unicode.org
The decompositions are not needed for plain text searches, that can use
On 2015/02/19 20:47, Julian Bradfield wrote:
On 2015-02-19, Eli Zaretskii e...@gnu.org wrote:
Does anyone know why does the UCD define compatibility decompositions
for Arabic initial, medial, and final forms, but doesn't do the same
for Hebrew final letters, like U+05DD HEBREW LETTER FINAL MEM?
What's better on this keyboard when compared to the Dvorak layout?
At first sight, it looks heavily right-handed, all the letters that the
Dvorak keyboard has on the homerow are on the right hand.
Regards, Martin.
P.S.: I'm a happy Dvorak user.
On 2015/01/26 06:54, Robert Wheelock wrote:
On 2015/06/04 17:03, Chris wrote:
I wish Steve Jobs was here to give this lecture.
Well, if Steve Jobs were still around, he could think about whether (and
how many) users really want their private characters, and whether it was
worth the time to have his engineers working on the solution.
On 2015/06/03 07:55, Chris wrote:
As you point out, The UCS will not encode characters without a demonstrated
usage.”. But there are use cases for characters that don’t meet UCS’s criteria for a
world wide standard, but are necessary for more specific use cases, like specialised
regional,
On 2015/06/22 05:37, Frédéric Grosshans wrote:
I don't know if it's what you're looking for but Google brought me to the
following URL.
https://www.itscj.ipsj.or.jp/itscj_english/iso-ir/ISO-IR.pdf
I managed to download the pdf without problems. I also successfully
downloaded a standard (
On 2015/05/29 11:37, John wrote:
If I had a large document that reused a particular character thousands of times,
Then it would be either a very boring document (containing almost only
that same character) or it would be a very large document.
would this HTML markup require embedding that
On 2015/07/29 23:27, Andrew West wrote:
On 29 July 2015 at 14:42, William_J_G Overington
My diet can include soya
There already is, you can write My diet can include soya.
If you are likely to swell up and die if you eat a peanut (for
example), you will not want to trust your life to an
Hello Richard,
On 2015/07/15 16:49, Richard Wordingham wrote:
What mark-up schemes exist to show that a sequence of letters and
combining marks constitutes a single word?
Such mark-up would be useful when using spell checkers. At present, I
use U+2060 WORD JOINER (WJ) to indicate the absence
Hello Doug,
Thanks for making us aware of this very sad event. Michael did a lot for
Unicode, and fought bravely with his illness. I hope we can all remember
him this week at the Unicode Conference, where he gave so many amazing
talks.
I also hope that somebody somehow will be able to
1 - 100 of 228 matches
Mail list logo