"Doug Ewell" wrote:
|Philippe Verdy wrote:
|>>> Well, you do have eleven bits for flags per codepoint, for example.
|>>
|>> That's not UCS-4; that's a custom encoding.
|>>
|>> (any UCS-4 code unit) & 0xFFE0 == 0
|
|(changing to "UTF-32" per Ken's observation)
|
|>
On Tue, 14 Mar 2017 08:51:18 +
Alastair Houghton wrote:
> On 14 Mar 2017, at 02:03, Richard Wordingham
> wrote:
> >
> > On Mon, 13 Mar 2017 19:18:00 +
> > Alastair Houghton wrote:
> > The
Philippe Verdy wrote:
>>> Well, you do have eleven bits for flags per codepoint, for example.
>>
>> That's not UCS-4; that's a custom encoding.
>>
>> (any UCS-4 code unit) & 0xFFE0 == 0
(changing to "UTF-32" per Ken's observation)
> Per definition yes, but UTC-4 is not Unicode.
I guess
Per definition yes, but UTC-4 is not Unicode.
As well (any UCS-4 code unit) & 0xFFE0 == 0 (i.e. 21 bits) is not
Unicode, UTF-32 is Unicode (more restrictive than just 21 bits which would
allow 32 planes instead of just the 17 first ones).
I suppose he meant 21 bits, not 11 bits which covers
Steffen Nurpmeso wrote:
>> I didn’t say you never needed to work with code points. What I said
>> is that there’s no advantage to UCS-4 as an encoding, and that
>
> Well, you do have eleven bits for flags per codepoint, for example.
That's not UCS-4; that's a custom encoding.
(any UCS-4 code
Alastair Houghton wrote:
|On 13 Mar 2017, at 21:10, Khaled Hosny wrote:
|> On Mon, Mar 13, 2017 at 07:18:00PM +, Alastair Houghton wrote:
|>> On 13 Mar 2017, at 17:55, J Decker wrote:
|>>>
|>>> I liked the Go
On 14 Mar 2017, at 02:03, Richard Wordingham
wrote:
>
> On Mon, 13 Mar 2017 19:18:00 +
> Alastair Houghton wrote:
>
>> IMO, returning code points by index is a mistake. It over-emphasises
>> the importance of the code point,
On 13 Mar 2017, at 21:10, Khaled Hosny wrote:
>
> On Mon, Mar 13, 2017 at 07:18:00PM +, Alastair Houghton wrote:
>> On 13 Mar 2017, at 17:55, J Decker wrote:
>>>
>>> I liked the Go implementation of character type - a rune type - which is a
>>>
Ah, it was what I thought you were talking about -- I wasn't aware they
were considered word boundaries :)
Thanks for the links!
On Mar 13, 2017 4:54 PM, "Richard Wordingham" <
richard.wording...@ntlworld.com> wrote:
On Mon, 13 Mar 2017 15:26:00 -0700
Manish Goregaokar
On Mon, 13 Mar 2017 19:18:00 +
Alastair Houghton wrote:
> IMO, returning code points by index is a mistake. It over-emphasises
> the importance of the code point, which helps to continue the notion
> in some developers’ minds that code points are somehow
On Mon, 13 Mar 2017 20:20:25 -0400
"Mark E. Shoulson" wrote:
> Sanskrit external vowel sandhi is comparatively
> straightforward (compared to consonant sandhi), and it frequently
> loses information. A *or* AA plus I is E; A *or* AA plus U is O (you
> need A + O to get AU).
A word ending in A *or* AA preceding a word beginning in A *or* AA will
all coalesce to a single AA in Sanskrit. That's four possibilities, and
that doesn't count a word ending in a consonant preceding a word
beginning in AA, which would be written the same. My memory is rusty,
so I should
On Mon, 13 Mar 2017 15:26:00 -0700
Manish Goregaokar wrote:
> Do you have examples of AA being split that way (and further reading)?
> I think I'm aware of what you're talking about, but would love to read
> more about it.
Just googling for the three words 'Sanskrit',
Do you have examples of AA being split that way (and further reading)?
I think I'm aware of what you're talking about, but would love to read
more about it.
-Manish
On Mon, Mar 13, 2017 at 2:47 PM, Richard Wordingham
wrote:
> On Mon, 13 Mar 2017 23:10:11 +0200
>
On Mon, 13 Mar 2017 23:10:11 +0200
Khaled Hosny wrote:
> But there are many text operations that require access to Unicode code
> points. Take for example text layout, as mapping characters to glyphs
> and back has to operate on code points. The idea that you never need
>
On Mon, Mar 13, 2017 at 07:18:00PM +, Alastair Houghton wrote:
> On 13 Mar 2017, at 17:55, J Decker wrote:
> >
> > I liked the Go implementation of character type - a rune type - which is a
> > codepoint. and strings that return runes from by index.
> >
On 13 Mar 2017, at 17:55, J Decker wrote:
>
> I liked the Go implementation of character type - a rune type - which is a
> codepoint. and strings that return runes from by index.
> https://blog.golang.org/strings
IMO, returning code points by index is a mistake. It
I liked the Go implementation of character type - a rune type - which is a
codepoint. and strings that return runes from by index.
https://blog.golang.org/strings
Doesn't solve the problem for composited codepoints though...
texel looks to be defined as a graphic element already. TEXture
Quote/Cytat - J Decker (Mon 13 Mar 2017 06:55:18 PM CET):
texel looks to be defined as a graphic element already. TEXture ELement.
I'm aware of it, but homonymy/polysemy is something we have to live
with. I think there is no risk of confusing texture elements with text
Quote/Cytat - Asmus Freytag (Mon 13 Mar 2017
06:00:08 PM CET):
[...]
This (or similar) scenarios indicate the impossibility to come to a
single, universal definition of a "textel" -- the main reason why this
term is of lower utility than "pixel".
I agree that it is
On 3/13/2017 3:31 AM, Janusz S. Bien
wrote:
Just yet another reason for introducing the notion of
textel?
The main difference between "textel" and "pixel"
is that the unit of processing /displaying text is not uniform
and fixed,
Prof. Janusz S. Bień wrote:
> Just yet another reason for introducing the notion of textel?
I opine that it would be a good idea to introduce several new words, of which
textel would be one, with each such new word having a precisely-defined meaning
so that in precise discussions of
Quote/Cytat - William_J_G Overington (Mon
13 Mar 2017 12:24:13 PM CET):
Prof. Janusz S. Bień wrote:
Just yet another reason for introducing the notion of textel?
I opine that it would be a good idea to introduce several new words,
of which textel would be
Quote/Cytat - Richard Wordingham
(Sun 12 Mar 2017 09:10:22 PM CET):
On Sun, 12 Mar 2017 20:02:28 +0100
"Janusz S. Bien" wrote:
If the basic notion has to be referred in a cumbersome way as
"extended grapheme cluster" then it is easier
On Sun, 12 Mar 2017 20:02:28 +0100
"Janusz S. Bien" wrote:
> If the basic notion has to be referred in a cumbersome way as
> "extended grapheme cluster" then it is easier to talk about "Unicode
> characters" despite the fact that they have a rather loose relation
> to
Quote/Cytat - Manish Goregaokar (Sun 12 Mar 2017
07:43:22 PM CET):
This is just another confirmation that the present Unicode terminology
is confusing.
I find this to be a symptom of our pedagogy around "characters" in
programming; most folks get taught that characters
> This is just another confirmation that the present Unicode terminology
is confusing.
I find this to be a symptom of our pedagogy around "characters" in
programming; most folks get taught that characters are bytes are code
points, especially because many languages try to make this the case.
The
On Fri, Mar 10 2017 at 19:55 CET, man...@mozilla.com writes:
> I recently wrote
> http://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/
> , which sort of addresses the whole hangup programmers have with
> treating code points as "characters".
[...]
This is
I recently wrote
http://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/
, which sort of addresses the whole hangup programmers have with
treating code points as "characters".
I also wrote
On Fri, Mar 10, 2017 at 05:00:55PM +, Peter Constable wrote:
> FYI:
>
> http://reedbeta.com/blog/programmers-intro-to-unicode/
>
> The visuals may be the most interesting part. E.g., in the usage heat
> map, Arabic Presentation Forms-B lights up much more than I would have
> expected
I
30 matches
Mail list logo