Re: [indic] Indian Rupee symbol

2010-07-16 Thread Martin J. Dürst
, and the LATIN CAPITAL LETTER RA LATIN CAPITAL LETTER RA? Shouldn't that be LATIN CAPITAL LETTER R? Regards,Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread Martin J. Dürst
) are used both as letters and as decimal place-value digits, and they are scattered widely, and of course there are is a lot of modern living practice. Regards,Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-28 Thread Martin J. Dürst
be 460, and 560 would be 五佰六十 :-). Regards, Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-28 Thread Martin J. Dürst
On 2010/07/29 13:33, karl williamson wrote: Asmus Freytag wrote: On 7/25/2010 6:05 PM, Martin J. Dürst wrote: Well, there actually is such a script, namely Han. The digits (一、 二、三、四、五、六、七、八、九、〇) are used both as letters and as decimal place-value digits, and they are scattered widely

Re: High dot/dot above punctuation?

2010-07-29 Thread Martin J. Dürst
,Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp

Re: High dot/dot above punctuation?

2010-07-29 Thread Martin J. Dürst
necessary when talking *about* these characters (meta-level) rather than when just using them (non-meta), then I would indeed agree that there is no reason to encode them separately. Regards,Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp

Re: Most complete (free) Chinese font?

2010-08-02 Thread Martin J. Dürst
Mono much, I'm not even sure whether I ever used it, but at the time I found the idea that somebody was working on a font that covered Unicode really worthy of support. Regards,Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp

Results of public Review Issues (in particular #121)

2010-08-03 Thread Martin J. Dürst
,Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp

Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters

2010-08-06 Thread Martin J. Dürst
to be fully deployed. Please see http://www.w3.org/Fonts/ for more details and pointers. Regards,Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp

Re: Proposed Update Unicode Technical Standard #46 (Unicode IDNA Compatibility Processing)

2010-09-24 Thread Martin J. Dürst
on Unicode strings (which, for many good reasons, were ultimately rejected), please see the discussion around http://lists.w3.org/Archives/Public/public-iri/2009Sep/0064.html. Regards,Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:due

Re: First posting to list: Unicode.org: unicode - punycode converter tool?

2010-10-30 Thread Martin J. Dürst
, so as in the (somewhat distant) future to allow for cases where a name with 'ß' and a name with 'ss' are resolved differently. Regards, Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp

Re: Utility to report and repair broken surrogate pairs in UTF-16 text

2010-11-04 Thread Martin J. Dürst
and identifies broken surrogate pairs and illegal characters? Ideally, the utility can both report illegal code units and repair them by replacing them with U+FFFD. Jim Monty -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:due

Re: Utility to report and repair broken surrogate pairs in UTF-16 text

2010-11-05 Thread Martin J. Dürst
. For some processing this is true, but it's rather short-sighted. Regards,Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp

Re: Utility to report and repair broken surrogate pairs in UTF-16 text

2010-11-05 Thread Martin J. Dürst
would you use Ruby for conversion when programming in Perl? You could just as well program in Ruby, it's much more fun! -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp

Fwd: RFC 6082 on Deprecating Unicode Language Tag Characters: RFC 2482 is Historic

2010-11-08 Thread Martin J. Dürst
FYI. Regards, Martin. Original Message Subject: RFC 6082 on Deprecating Unicode Language Tag Characters: RFC 2482 is Historic Date: Sun, 7 Nov 2010 21:50:44 -0800 (PST) From: rfc-edi...@rfc-editor.org To: ietf-annou...@ietf.org, rfc-d...@rfc-editor.org CC:

Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

2010-11-10 Thread Martin J. Dürst
is a sub-encoding of windows-1252 if the former is interpreted as not including the C1 range. Regards, Martin. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:due...@it.aoyama.ac.jp

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Martin J. Dürst
On 2011/07/15 18:51, Michael Everson wrote: On 15 Jul 2011, at 09:47, Andrew West wrote: If you want a font to display a visible glyph for a format or space character then you should just map the glyph to its character in the font, as many fonts already do for certain format characters.

Re: [bidi] Re: PRI 185 Revision of UBA for improved display of URL/IRIs

2011-07-29 Thread Martin J. Dürst
Hello Mark, others, On 2011/07/28 5:01, Mark Davis ☕ wrote: Just to remind people: posting to this list does *not* mean submitting to the UTC. If you want to discuss a proposal here, not a problem, but just remember that if you want any action you have to submit to the UTC. Unicode members

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-09 Thread Martin J. Dürst
On 2011/09/10 9:32, Stephan Stiller wrote: Actually, I *was* talking about purely typographic/aesthetic ligatures as well. I'm aware that which di-/trigraphs need to be considered from a font design perspective is language-dependent. And this language-dependence is not only a question of

Re: continue: Glaring Mistake in nomenclature

2011-09-14 Thread Martin J. Dürst
Hello Delex, On 2011/09/14 15:55, delex r wrote: The “Dark age of Assamese language” ran for about 37 years in this region when it was tried to kill a the language by vested interests with the help of British Political powers imposing Bengali as medium of instruction in school and

Re: Civil suit; ftp shutdown; mailing list shutdown

2011-10-06 Thread Martin J. Dürst
[By accident, I sent this only to Ken first; he recommended I send it to both Unicode and Unicore.] I have sent a mail to a relevant IETF list (apps-disc...@ietf.org); the IETF was looking into taking this over, with http://tools.ietf.org/html/draft-lear-iana-timezone-database-04, but

Re: Civil suit; ftp shutdown; mailing list shutdown

2011-10-07 Thread Martin J. Dürst
,Martin. On 2011/10/07 14:14, Martin J. Dürst wrote: [By accident, I sent this only to Ken first; he recommended I send it to both Unicode and Unicore.] I have sent a mail to a relevant IETF list (apps-disc...@ietf.org); the IETF was looking into taking this over, with http://tools.ietf.org/html

Re: about P1 part of BIDI alogrithm

2011-10-10 Thread Martin J. Dürst
On 2011/10/10 21:10, Eli Zaretskii wrote: Date: Mon, 10 Oct 2011 17:47:21 +0800 From: li bolibo@gmail.com From section 3: Paragraphs are divided by the Paragraph Separator or appropriate Newline Function (for guidelines on the handling of CR, LF, and CRLF, see Section 4.4,

Re: Solidus variations

2011-10-10 Thread Martin J. Dürst
On 2011/10/11 7:35, Philippe Verdy wrote: I've seen various interpretations, but the ASCII solidus is unambiguously used with a strong left-to-right associativity, and the same occurs in classical mathematics notations (the horizontal bar is another notation but even where it is used, it also

Re: about P1 part of BIDI alogrithm

2011-10-10 Thread Martin J. Dürst
On 2011/10/11 10:29, Martin J. Dürst wrote: On 2011/10/10 21:10, Eli Zaretskii wrote: Date: Mon, 10 Oct 2011 17:47:21 +0800 In addition to the Paragraph Separator, _any_ newline function (LF, CR+LF, CR, or NEL) can end a paragraph. Also U+2028, the LS character. See section 5.8

Re: about P1 part of BIDI alogrithm

2011-10-10 Thread Martin J. Dürst
On 2011/10/11 13:07, Eli Zaretskii wrote: Date: Tue, 11 Oct 2011 10:53:39 +0900 From: Martin J. Dürstdue...@it.aoyama.ac.jp CC: li bolibo@gmail.com, unicode@unicode.org This is different from what you did in Emacs, which I'd call line-folding, i.e. cut the line after a paragraph is laid out

Re: about P1 part of BIDI alogrithm

2011-10-11 Thread Martin J. Dürst
Hello Eli, There is absolutely no problem to treat the algorithm in UAX#9 as a set of requirements, and come up with a totally different implementation that produces the same results. I think actually UAX#9 says so somewhere. But what is, strictly speaking, not allowed is to change the

Re: about P1 part of BIDI alogrithm

2011-10-11 Thread Martin J. Dürst
Hello Kent, I was also very much thinking that mirrored glyph should be of the same width, but there might be subtle issues when you consider kerning. As a very basic example, think about kerning of the pair K), and then think about K(. Regards, Martin. On 2011/10/11 19:39, Kent Karlsson

Wrong UTF-8 encoders still around?

2011-10-20 Thread Martin J. Dürst
I'm hoping to get some advice from people with experience with various Unicode/transcoding libraries. RFC 3987 (the current IRI spec) has the following text: Note: Some older software transcoding to UTF-8 may produce illegal output for some input, in particular for characters outside

Forum Problems

2011-10-24 Thread Martin J. Dürst
How can one use the Forum to comment on URI/IRI issues when one gets a message: Your message contains too many URLs. The maximum number of URLs allowed is 8. I never liked this forum stuff too much, and this hasn't made things better :-(. Regards, Martin.

Default bidi ranges

2011-11-09 Thread Martin J. Dürst
I tried to find something like a normative description of the default bidi class of unassigned code points. In UTR #9, it says (http://www.unicode.org/reports/tr9/tr9-23.html#Bidirectional_Character_Types): Unassigned characters are given strong types in the algorithm. This is an explicit

Re: missing characters: combining marks above runs of more than 2 base letters

2011-11-20 Thread Martin J. Dürst
On 2011/11/21 5:54, Asmus Freytag wrote: On 11/20/2011 8:00 AM, Joó Ádám wrote: Leaving aside that CSS is presentation and not content, and is definitely not markup. HTML is a better candidate. Á The details of the appearance of the mark would be presentation. The scoping, like for applying

Re: Unicode, SMS and year 2012

2012-04-27 Thread Martin J. Dürst
On 2012/04/28 4:26, Mark Davis ☕ wrote: Actually, if the goal is to get as many characters in as possible, Punycode might be the best solution. That is the encoding used for internationalized domains. In that form, it uses a smaller number of bytes per character, but a parameterization allows

Re: Unicode, SMS and year 2012

2012-04-27 Thread Martin J. Dürst
On 2012/04/28 7:29, Cristian Secară wrote: În data de Fri, 27 Apr 2012 12:26:25 -0700, Mark Davis ☕ a scris: Actually, if the goal is to get as many characters in as possible, Punycode might be the best solution. That is the encoding used for internationalized domains. In that form, it uses a

Re: Unicode, SMS and year 2012

2012-04-27 Thread Martin J. Dürst
On 2012/04/27 17:06, Cristian Secară wrote: It turned out that they (ETSI its groups) created a way to solve the 70 characters limitation, namely “National Language Single Shift” and “National Language Locking Shift” mechanism. This is described in 3GPP TS 23.038 standard and it was introduced

Re: Unicode, SMS and year 2012

2012-04-29 Thread Martin J. Dürst
On 2012/04/29 18:58, Szelp, A. Sz. wrote: While there are good reasons the authors of HTML5 brought to ignore SCSU or BOCU-1, having excluded UTF-32 which is the most direct, one-to-one mapping of Unicode codepoints to byte values seems shortsighted. Well, except that it's hopelessly

Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)

2012-05-29 Thread Martin J. Dürst
On 2012/05/29 17:43, Asmus Freytag wrote: On 5/27/2012 5:52 PM, Michael Everson wrote: Get over it. Please just get over it. It doesn't matter. It's a blort. Time to agree with Michael. Get over it, is good advice here. Sovereign countries are free to decree currency symbols, whatever their

Re: Unicode 6.2 to Support the Turkish Lira Sign

2012-05-30 Thread Martin J. Dürst
On 2012/05/30 4:42, Roozbeh Pournader wrote: Just look what happened when the Japanese did their own font/character set hack. The backslash/yen problem is still with us, to this day... To be fair, the Japanese Yen at 0x5C was there long before Unicode, in the Japanese version of ISO 646.

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-10 Thread Martin J. Dürst
On 2012/07/11 4:37, Asmus Freytag wrote: I recall, with certainty, having seen the : in the context of elementary instruction in arithmetic, as in 4 : 2 = ?, but am no longer positive about seeing ÷ in the same context. I remember this very well. In grade school, we had to learn two ways to

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-10 Thread Martin J. Dürst
On 2012/07/11 10:35, Stephan Stiller wrote: About Martin Dürst's content re geteilt-gemessen: When I attended the German school system in approx the 1990s this distinction wasn't mentioned or taught. (I prefer to not give details about specific time and place for privacy reasons.) Sorry, but

Re: Sinhala naming conventions

2012-07-10 Thread Martin J. Dürst
On 2012/07/11 11:04, Mark E. Shoulson wrote: Ever start to feel that we would have been better off not to give official descriptive names at all? Or else really vague ones like LETTERLIKE THINGY NUMBER 5412? So much blood-pressure raised over the names... I'm feeling that way since about the

Re: pre-HTML5 and the BOM

2012-07-13 Thread Martin J. Dürst
On 2012/07/13 0:12, Leif Halvard Silli wrote: Doug Ewell, Wed, 11 Jul 2012 09:12:46 -0600: and people who want to create or modify UTF-8 files which will be consumed by a process that is intolerant of the signature should not use Notepad. That goes for HTML (pre-5) pages [snip] HTML5-parsers

Re: pre-HTML5 and the BOM

2012-07-17 Thread Martin J. Dürst
On 2012/07/13 22:31, Jukka K. Korpela wrote: 2012-07-13 16:12, Leif Halvard Silli wrote: The kind of BOM intolerance I know about in user agents is that some text browsers and IE5 for Mac (abandoned) convert the BOM into a (typically empty) line a the start of the body element. I wonder if

Re: pre-HTML5 and the BOM

2012-07-17 Thread Martin J. Dürst
On 2012/07/14 1:33, Philippe Verdy wrote: Fra: Jukka K. Korpelajkorp...@cs.tut.fi When the BOM is used in web pages or editors for UTF-8 encoded content it can sometimes introduce blank spaces or short sequences of strange-looking characters (such as ). For this reason, it is usually best

Re: pre-HTML5 and the BOM

2012-07-17 Thread Martin J. Dürst
On 2012/07/17 17:22, Leif Halvard Silli wrote: And an argument was put forward in the WHATWG mailinglist earlier tis year/end of previous year, that a page with strict ASCII characters inside could still contain character entities/references for characters outside ASCII. Of course they can.

Re: pre-HTML5 and the BOM

2012-07-17 Thread Martin J. Dürst
Hello Leif, Sorry to be late with my answer. On 2012/07/13 20:44, Leif Halvard Silli wrote: Martin J. Dürst, Fri, 13 Jul 2012 18:17:05 +0900: On 2012/07/13 0:12, Leif Halvard Silli wrote: Doug Ewell, Wed, 11 Jul 2012 09:12:46 -0600: and people who want to create or modify UTF-8 files which

Re: pre-HTML5 and the BOM

2012-07-17 Thread Martin J. Dürst
Hello Leif, On 2012/07/18 4:35, Leif Halvard Silli wrote: But is the Windows Notepad really to blame? Pretty much so. There may have been other products from Microsoft that also did it, but with respect to forcing browsers and XML parsers to accept an UTF-8 BOM as a signature, Notepad was

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-17 Thread Martin J. Dürst
Hello Philippe, On 2012/07/18 3:37, Philippe Verdy wrote: 2012/7/17 Julian Bradfieldjcb+unic...@inf.ed.ac.uk: On 2012-07-16, Philippe Verdyverd...@wanadoo.fr wrote: I am also convinced that even Shell interpreters on Linux/Unix should recognize and accept the leading BOM before the hash/bang

Re: pre-HTML5 and the BOM

2012-07-17 Thread Martin J. Dürst
Hello Jukka, On 2012/07/17 23:31, Jukka K. Korpela wrote: 2012-07-17 17:11, Leif Halvard Silli wrote: For instance, early on in 'the Web', some appeared to think that all non-ASCII had to be represented as entities. Yes indeed. There's still some such stuff around. It's mostly unnecessary,

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-18 Thread Martin J. Dürst
Hello Doug, On 2012/07/18 0:35, Doug Ewell wrote: For those who haven't yet had enough of this debate yet, here's a link to an informative blog (with some informative comments) from Michael Kaplan: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)

Re: pre-HTML5 and the BOM

2012-07-18 Thread Martin J. Dürst
On 2012/07/18 16:35, Leif Halvard Silli wrote: Martin J. Dürst, Wed, 18 Jul 2012 11:00:42 +0900: The best reason is simply that nobody should be using crutches as long as they can walk with their own legs. Crutches, in that sense, is only about authoring convenience. And, of course

Re: pre-HTML5 and the BOM

2012-07-18 Thread Martin J. Dürst
Hello Leif, I think that more and more, we are on the wrong mailing list. Regards, Martin. On 2012/07/18 18:47, Leif Halvard Silli wrote: Martin J. Dürst, Wed, 18 Jul 2012 17:20:31 +0900: On 2012/07/18 16:35, Leif Halvard Silli wrote: Martin J. Dürst, Wed, 18 Jul 2012 11:00:42 +0900

Re: Unicode String Models

2012-07-20 Thread Martin J. Dürst
On 2012/07/21 7:01, David Starner wrote: I'm concerned about the statement/implication that one can optimize for ASCII and Latin-1. It's too easy for a lot of developers to test speed with the English/European documents they have around and test correctness only with Chinese. I see the argument

Re: Is the Subject field of an e-mail an obvious example of plain text where no higher level protocol application is possible?

2012-07-21 Thread Martin J. Dürst
Hello Karl, On 2012/07/21 0:41, Karl Pentzlin wrote: Looking for an example of plain text which is obvious to anybody, it seems to me that the Subject field of e-mails is a good example. Common e-mail software lets you enter any text but gives you never access to any higher-level protocol.

Re: Character set cluelessness

2012-10-02 Thread Martin J. Dürst
Richard - Complex script usually refers to scripts where rendering isn't just simply putting glyphs side by side. That includes stuff with combining marks, ligatures, reordering, stacking, and the like. Regards, Martin. On 2012/10/03 7:09, Richard Wordingham wrote: On Tue, 02 Oct 2012

Re: Character set cluelessness

2012-10-02 Thread Martin J. Dürst
So in order to get something going here, why doesn't Doug draft a letter to these guys (possibly based on the one from a few years ago) and then Mark sends it off in his position at Unicode, which hopefully will impress them more than just a personal contribution. Being upset in this list

Re: Missing geometric shapes

2012-11-08 Thread Martin J. Dürst
On 2012/11/08 19:15, Michael Everson wrote: On 8 Nov 2012, at 09:59, Simon Montagusmont...@smontagu.org wrote: Please take into account that the half-stars should be symmetric-swapped in RTL text. I attach an example from an advertisment for a movie published in Haaretz 2 November 2012 I

Re: Caret

2012-11-14 Thread Martin J. Dürst
On 2012/11/13 21:49, Eli Zaretskii wrote: I'd welcome that. Although the reality flies in the face of user requirements in this case: most bidi-aware editors, including my own work in Emacs, don't have 2 carets, for some reason. Maybe the developers didn't consider that important enough, or

Re: latin1 decoder implementation

2012-11-17 Thread Martin J. Dürst
Just in case it helps, Ruby (since version 1.9) also uses 3). Regards, Martin. On 2012/11/17 6:48, Buck Golemon wrote: When decoding bytes to unicode using the latin1 scheme, there are three options for bytes not defined in the ISO-8859-1 standard. 1) Throw an error. 2) Insert the

Re: latin1 decoder implementation

2012-11-17 Thread Martin J. Dürst
On 2012/11/17 9:45, Doug Ewell wrote: If he is targeting HTML5, then none of this matters, because HTML5 says that ISO 8859-1 is really Windows-1252. Yes. But unless Python wants to limit its use to HTML5, this should be handled on a separate level (mapping a iso-8859-1 label to the

Re: latin1 decoder implementation

2012-11-19 Thread Martin J. Dürst
On 2012/11/17 9:56, Philippe Verdy wrote: True. HTML5 makes its own reinterpretation of the IETF's MIME standard, definining it own protocol (which means that it is no longer fully compatible with MIME and its IANA datatabase, because the mapping of the value of a charset= pseudo-attribute is

Re: cp1252 decoder implementation

2012-11-21 Thread Martin J. Dürst
On 2012/11/21 16:23, Peter Krefting wrote: Doug Ewell d...@ewellic.org: Somewhat off-topic, I find it amusing that tolerance of poorly encoded input is considered justification for changing the underlying standards, The encoding work at W3C, at least as far as I see it, is not an attempt to

Why 17 planes? (was: Re: Why 11 planes?)

2012-11-27 Thread Martin J. Dürst
Well, first, it is 17 planes (or have we switched to using hexadecimal numbers on the Unicode list already? Second, of course this is in connection with UTF-16. I wasn't involved when UTF-16 was created, but it must have become clear that 2^16 (^ denotes exponentiation (to the power of))

Re: cp1252 decoder implementation

2012-11-27 Thread Martin J. Dürst
On 2012/11/17 12:54, Buck Golemon wrote: On Fri, Nov 16, 2012 at 4:11 PM, Doug Ewelld...@ewellic.org wrote: Buck Golemon wrote: Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and to map it to the equally-non-semantic U+81 ? U+0081 (there are always at least four

Re: Why 17 planes?

2012-11-27 Thread Martin J. Dürst
To this, my mother would say: Why keep it simple when we can make it complicated?. Regards,Martin. On 2012/11/27 21:01, Philippe Verdy wrote: That's a valid computation if the extension was limited to use only 2-surrogate encodings for supplementary planes. If we could use 3-surrogate

Tool to convert characters to character names

2012-12-19 Thread Martin J. Dürst
I'm looking for a (preferably online) tool that converts Unicode characters to Unicode character names. Richard Ishida's tools (http://rishida.net/tools/conversion/) do a lot of conversions, but not names. Regards, Martin.

Re: Character name translations

2012-12-20 Thread Martin J. Dürst
On 2012/12/21 0:59, Asmus Freytag wrote: There have been efforts at a Japanese translation of the text of the standard, I have no idea whether that contains translated names for characters. JIS X 0221-1995, which is a translation of ISO 10646, contains some Japanese character names, but this

Re: Why is endianness relevant when storing data on disks but not when in memory?

2013-01-05 Thread Martin J. Dürst
On 2013/01/06 7:21, Costello, Roger L. wrote: Does this mean that when exchanging Unicode data across the Internet the endianness is not relevant? Are these stated correctly: When Unicode data is in a file we would say, for example, The file contains UTF-32BE data. When Unicode

Re: What does it mean to not be a valid string in Unicode?

2013-01-07 Thread Martin J. Dürst
On 2013/01/08 3:27, Markus Scherer wrote: Also, we commonly read code points from 16-bit Unicode strings, and unpaired surrogates are returned as themselves and treated as such (e.g., in collation). That would not be well-formed UTF-16, but it's generally harmless in text processing. Things

Re: What does it mean to not be a valid string in Unicode?

2013-01-08 Thread Martin J. Dürst
On 2013/01/08 14:43, Stephan Stiller wrote: Wouldn't the clean way be to ensure valid strings (only) when they're built Of course, the earlier erroneous data gets caught, the better. The problem is that error checking is expensive, both in lines of code and in execution time (I think there

Re: Normalization rate on the Web

2013-01-21 Thread Martin J. Dürst
On 2013/01/22 1:12, Denis Jacquerye wrote: Does anybody have any idea of how much of the Web is normalized in NFC or NFD? Or how much not normalized? I have never measured this. But at one time, there was only NFD (and NFKD). The Unicode Consortium, with input from W3C, then defined NFC (and

Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

2013-02-04 Thread Martin J. Dürst
Hello Roger, The conclusion to your question below is a very clear NO. The reason is that most text is already in NFC. In fact, as I wrote a few days or weeks ago, NFC was defined to capture what's usually around on the Web (and in other places, too). Trying to recommend that everything be in

Re: Why wasn't it possible to encode a coeng-like joiner for Tibetan?

2013-04-12 Thread Martin J. Dürst
On 2013/04/11 16:30, Michael Everson wrote: On 11 Apr 2013, at 00:09, Shriramana Sharmasamj...@gmail.com wrote: Or was the Khmer model of an invisible joiner a *later* bright idea? Yes. Later, yes. Bright? Most Kambodian experts disagree. Regards, Martin.

Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-23 Thread Martin J. Dürst
On 2013/04/23 18:01, William_J_G Overington wrote: On Monday 22 April 2013, Asmus Freytagasm...@ix.netcom.com wrote: I'm always suspicious if someone wants to discuss scope of the standard before demonstrating a compelling case on the merits of wide-spread actual use. The reason that I

Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)

2013-07-03 Thread Martin J. Dürst
On 2013/06/22 0:32, Michael Everson wrote: On 21 Jun 2013, at 16:20, Khaled Hosnykhaledho...@eglug.org wrote: Yeah, I don't believe that you can language-tag individual file names for such display as that is markup. Why do you need to? You only need one language, it is not like file names

Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)

2013-07-05 Thread Martin J. Dürst
On 2013/07/05 16:04, Denis Jacquerye wrote: On Thu, Jul 4, 2013 at 12:07 PM, Michael Eversonever...@evertype.com wrote: The problem is in pretending that a cedilla and a comma below are equivalent because in some script fonts in France or Turkey routinely write some sort of

Re: writing in an alphabet with fewer letters: letter replacements

2013-07-05 Thread Martin J. Dürst
On 2013/07/05 17:25, Stephan Stiller wrote: What I had in mind was more specific: Germans are supposed to convert [ä,ö,ü,ß] to [ae,oe,ue,ss], though I don't know what's considered best/legal wrt documents required for entering the US, for example. I have always used Duerst on plane tickets

Re: COMBINING OVER MARK?

2013-10-02 Thread Martin J. Dürst
On 2013/10/02 9:52, Leo Broukhis wrote: Thanks! That comes out exactly right, although using math markup for linguistic purposes is, IMO, a stretch. Why? Surely like in other fields (Math to start with), there somewhere is a boundary between plain text and rich text. Of course it's not

Re: ¥ instead of \

2013-10-27 Thread Martin J. Dürst
On 2013/10/23 4:22, Asmus Freytag wrote: On 10/22/2013 11:38 AM, Jean-François Colson wrote: Hello. I know that in some Japanese encodings (JIS, EUC), \ was replaced by a ¥. On my computer, there are some Japanese fonts where the characters seems coded following Unicode, except for the \

Re: Request for review: 3023bis (XML media types) makes significant changes

2013-12-18 Thread Martin J. Dürst
Hello Henry, Some comments on your specific questions, which may trigger some additional discussion. On 2013/12/12 1:43, Henry S. Thompson wrote: I'm one of the editors of a proposed replacement for RFC3023 [1], the media type registration for application/xml, text/xml and 3 others. The

Fwd: Updated Japanese Legacy Standard? (was: Re: Romanized Singhala got great reception in Sri Lanka)

2014-03-28 Thread Martin J. Dürst
J. Dürst due...@it.aoyama.ac.jp On 2014/03/16 14:36, Philippe Verdy wrote: You may still want to promote it at some government or education institution, in order to promote it as a national standard, except that there's little change it will ever happen when all countries in ISO have stopoed

Fwd: Re: Romanized Singhala got great reception in Sri Lanka

2014-03-28 Thread Martin J. Dürst
I got informed today by your IT Dept. that the mail below never went out. Resent herewith.Martin. Original Message Subject: Re: Romanized Singhala got great reception in Sri Lanka Date: Mon, 17 Mar 2014 14:37:00 +0900 From: Martin J. Dürst due...@it.aoyama.ac.jp On 2014

Re: FYI: More emoji from Chrome

2014-04-01 Thread Martin J. Dürst
Now that it's no longer April 1st (at least not here in Japan), I can add a (moderately) serious comment. On 2014/04/02 01:43, Ilya Zakharevich wrote: On Tue, Apr 01, 2014 at 09:01:39AM +0200, Mark Davis ☕️ wrote: More emoji from Chrome:

Re: FYI: More emoji from Chrome

2014-04-02 Thread Martin J. Dürst
On 2014/04/02 20:08, Christopher Fynn wrote: On 02/04/2014, Asmus Freytag asm...@ix.netcom.com wrote: On 4/2/2014 1:42 AM, Christopher Fynn wrote: Rather than Emoji it might be better if people learnt Han ideographs which are also compact (and a far more developed system of communication than

Re: Emoji

2014-04-02 Thread Martin J. Dürst
On 2014/04/03 02:00, James Lin wrote: Emoji or 顔文字, literally means Face word or Face Characters, essentially, Emoji is 絵文字 (picture character), 顔文字 is kaomoji (face character). Regards, Martin. provides an emotional state in the context of words. Emoji is very popular in APJ, and

Re: Corrigendum #9

2014-06-03 Thread Martin J. Dürst
On 2014/06/03 07:08, Asmus Freytag wrote: On 6/2/2014 2:53 PM, Markus Scherer wrote: On Mon, Jun 2, 2014 at 1:32 PM, David Starner prosfil...@gmail.com mailto:prosfil...@gmail.com wrote: I would especially discourage any web browser from handling these; they're noncharacters used for

Re: Request for Information

2014-07-24 Thread Martin J. Dürst
On 2014/07/24 15:37, Richard Wordingham wrote: No. The text samples I could find quickly show scripta continua, but I suspect the line breaks are occurring at word or syllable boundaries. If I am right about the constraint on line break position, then this can be recovered by marking the

Code charts and code points (was: Re: fonts for U7.0 scripts)

2014-10-24 Thread Martin J. Dürst
On 2014/10/24 10:21, Asmus Freytag wrote: Peter is correct. The only fonts that should be released to the public are those that are Unicode encoded and have the correct shaping tables. Unlike the public, the code chart editors for Unicode have tools that can correctly handle not only

Re: emoji are clearly the current meme fad

2014-12-17 Thread Martin J. Dürst
On 2014/12/18 06:49, Michael Everson wrote: Clearly the plural of emoji is emojis. Not in Japanese, where there are no plural forms. The question of what it is/will be in English will be decided by usage, not by grammar. I'd use 'emoji', but then I'm too biased towards Japanese to be

Re: Unicode encoding policy

2014-12-23 Thread Martin J. Dürst
On 2014/12/24 09:50, Tex Texin wrote: True, however as William points out, apparently the rules have changed, I hope the rules get clarified to clearly state that these are exceptions. so it isn’t unreasonable to ask again whether the rules now allow it, or if people that dismissed the idea

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Martin J. Dürst
On 2015/02/20 05:17, Eli Zaretskii wrote: From: Philippe Verdy verd...@wanadoo.fr Date: Thu, 19 Feb 2015 20:31:07 +0100 Cc: Julian Bradfield jcb+unic...@inf.ed.ac.uk, unicode Unicode Discussion unicode@unicode.org The decompositions are not needed for plain text searches, that can use

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Martin J. Dürst
On 2015/02/19 20:47, Julian Bradfield wrote: On 2015-02-19, Eli Zaretskii e...@gnu.org wrote: Does anyone know why does the UCD define compatibility decompositions for Arabic initial, medial, and final forms, but doesn't do the same for Hebrew final letters, like U+05DD HEBREW LETTER FINAL MEM?

Re: The NEW Keyboard Layout—IEAOU

2015-01-25 Thread Martin J. Dürst
What's better on this keyboard when compared to the Dvorak layout? At first sight, it looks heavily right-handed, all the letters that the Dvorak keyboard has on the homerow are on the right hand. Regards, Martin. P.S.: I'm a happy Dvorak user. On 2015/01/26 06:54, Robert Wheelock wrote:

Re: Tag characters and in-line graphics (from Tag characters)

2015-06-05 Thread Martin J. Dürst
On 2015/06/04 17:03, Chris wrote: I wish Steve Jobs was here to give this lecture. Well, if Steve Jobs were still around, he could think about whether (and how many) users really want their private characters, and whether it was worth the time to have his engineers working on the solution.

Re: Tag characters and in-line graphics (from Tag characters)

2015-06-02 Thread Martin J. Dürst
On 2015/06/03 07:55, Chris wrote: As you point out, The UCS will not encode characters without a demonstrated usage.”. But there are use cases for characters that don’t meet UCS’s criteria for a world wide standard, but are necessary for more specific use cases, like specialised regional,

Re: International Register of Coded Character Sets

2015-06-21 Thread Martin J. Dürst
On 2015/06/22 05:37, Frédéric Grosshans wrote: I don't know if it's what you're looking for but Google brought me to the following URL. https://www.itscj.ipsj.or.jp/itscj_english/iso-ir/ISO-IR.pdf I managed to download the pdf without problems. I also successfully downloaded a standard (

Re: Tag characters and in-line graphics (from Tag characters)

2015-06-02 Thread Martin J. Dürst
On 2015/05/29 11:37, John wrote: If I had a large document that reused a particular character thousands of times, Then it would be either a very boring document (containing almost only that same character) or it would be a very large document. would this HTML markup require embedding that

Re: Emoji characters for food allergens

2015-07-29 Thread Martin J. Dürst
On 2015/07/29 23:27, Andrew West wrote: On 29 July 2015 at 14:42, William_J_G Overington My diet can include soya There already is, you can write My diet can include soya. If you are likely to swell up and die if you eat a peanut (for example), you will not want to trust your life to an

Re: Mark-up to Indicate Words

2015-07-15 Thread Martin J. Dürst
Hello Richard, On 2015/07/15 16:49, Richard Wordingham wrote: What mark-up schemes exist to show that a sequence of letters and combining marks constitutes a single word? Such mark-up would be useful when using spell checkers. At present, I use U+2060 WORD JOINER (WJ) to indicate the absence

Re: A Bulldog moves on

2015-10-24 Thread Martin J. Dürst
Hello Doug, Thanks for making us aware of this very sad event. Michael did a lot for Unicode, and fought bravely with his illness. I hope we can all remember him this week at the Unicode Conference, where he gave so many amazing talks. I also hope that somebody somehow will be able to

  1   2   3   >