Re: Another take on the English apostrophe in Unicode

2015-06-04 Thread David Starner
On Thu, Jun 4, 2015 at 2:38 PM Markus Scherer markus@gmail.com wrote: don’t is a contraction of two words, it is not one word. But as he points out, it's not a contraction of don and t; it is, at best, a contraction of do and n't. It's eliding, not punctuating. In the comments, he also

Re: Custom characters (was: Re: Private Use Area in Use)

2015-06-04 Thread Chris
I think this stuff could be relatively easy to define and standardise. You could basically define the entire technology in 1 A4 document. People have just got to want it badly enough to agree on it, and give it the imprimatur of the consortium. Then define it. It doesn't need Unicode

Re: Another take on the English apostrophe in Unicode

2015-06-04 Thread David Starner
Hyphens generally make multiple words into one anyway. There's not really multiple hyphens the way there's separate quotes and apostrophes. On 7:01pm, Thu, Jun 4, 2015 Leo Broukhis l...@mailcom.com wrote: Along the same lines, we might need a MODIFIER LETTER HYPHEN, because, for example, the

Re: Another take on the English apostrophe in Unicode

2015-06-04 Thread Leo Broukhis
Along the same lines, we might need a MODIFIER LETTER HYPHEN, because, for example, the work ack-ack isn't decomposable into words, or even morphemes, ack and ack. Leo On Thu, Jun 4, 2015 at 6:31 PM, David Starner prosfil...@gmail.com wrote: On Thu, Jun 4, 2015 at 2:38 PM Markus Scherer

Re: Another take on the English apostrophe in Unicode

2015-06-04 Thread Markus Scherer
Looks all wrong to me. don’t is a contraction of two words, it is not one word. English is taught as that squiggle being punctuation, not a letter. (Unlike, say, the Hawaiʻian ʻOkina http://en.wikipedia.org/wiki/%CA%BBOkina.) You can't use simple regular expressions to find word boundaries.

Another take on the English apostrophe in Unicode

2015-06-04 Thread Frédéric Grosshans
An interesting argument for U+02BC MODIFIER LETTER APOSTROPHE as English apostrophe : https://tedclancy.wordpress.com/2015/06/03/which-unicode-character-should-represent-the-english-apostrophe-and-why-the-unicode-committee-is-very-wrong/ Frédéric

Re: Custom characters (was: Re: Private Use Area in Use)

2015-06-04 Thread Parker Higgins
On Thu, Jun 4, 2015 at 12:43 AM, Chris idou...@gmail.com wrote: Characters are 64 bit. 32 bits are stripped off as the “character set provider ID”. That is sent to one of many canonical servers akin to DNS servers to find the URL owner of those characters. At that location you’d find a

Re: Custom characters (was: Re: Private Use Area in Use)

2015-06-04 Thread David Starner
On Thu, Jun 4, 2015 at 6:09 AM John idou...@gmail.com wrote: Mostly just a matter of upgrading the character size. Which totally blows any concern with text size out of the water. Using 30 bytes to define certain very rare characters and 1 byte to define ASCII is way better then using 8 bytes

Re: Custom characters (was: Re: Private Use Area in Use)

2015-06-04 Thread Chris
Well, that's the rub, isn't it? We (in IT) are still working pretty dang hard on the simpler problem, to wit: There should be a way to put standard characters anywhere that characters belong and have things just work. And even *that* is a hard problem that has taken over 25 years --

Re: Custom characters (was: Re: Private Use Area in Use)

2015-06-04 Thread Asmus Freytag (t)
On 6/4/2015 1:46 AM, William_J_G Overington wrote: I thought that I would mention it, though I cannot quite at the moment understand the issue. I'm long past where I'm sure I understand what the issue is. :) A./

Re: Custom characters (was: Re: Private Use Area in Use)

2015-06-04 Thread Richard Wordingham
On Thu, 04 Jun 2015 14:39:27 + David Starner prosfil...@gmail.com wrote: On Thu, Jun 4, 2015 at 6:09 AM John idou...@gmail.com wrote: Mostly just a matter of upgrading the character size. Which totally blows any concern with text size out of the water. Using 30 bytes to define

Re: Tag characters and in-line graphics (from Tag characters)

2015-06-04 Thread Chris
No, that's why you include a reference to the font in the private agreement, so that interested parties can install it and see the special character(s). People with their iphones and ipads and so forth don’t want to have “private agreements”, they don’t want to “install character sets”. The

Re: Tag characters and in-line graphics (from Tag characters)

2015-06-04 Thread Chris
On 4 Jun 2015, at 10:59 am, David Starner prosfil...@gmail.com wrote: On Wed, Jun 3, 2015 at 5:46 PM Chris idou...@gmail.com mailto:idou...@gmail.com wrote: I personally think emoji should have one, single definitive representation for this exact reason. Then you want an image. I

Re: Custom characters (was: Re: Private Use Area in Use)

2015-06-04 Thread William_J_G Overington
Chris expressed an idea, hypothetically starting: Characters are 64 bit. The following posts might be helpful. http://www.unicode.org/mail-arch/unicode-ml/y2011-m08/0277.html http://www.unicode.org/mail-arch/unicode-ml/y2011-m08/0307.html For 64 bits, or somewhere in that region, maybe just a few

Re: Custom characters (was: Re: Private Use Area in Use)

2015-06-04 Thread John
It occurs to me that the existing DNS system was designed to map 32bit numbers to domain names. So a hypothetical UTF64 format, with 32 bits of provider ID could be co-opted into the DNS system under a different record domain (Similar to how there is A records, and MX records, there could be