On 17 Jul 2017, at 13:25, Christoph Päper via Unicode
wrote:
>
> Finally, should smart fonts make U+0020 exactly as wide as an em when between
> two emojis?
I’ll leave it to others to answer the rest (I don’t know the answers to those),
but the answer to this is clearly
On 2 Jul 2017, at 16:59, Jörg Knappen via Unicode wrote:
>
> > Is it possible to design fonts that will render ẞ as SS?
>
> In fact, that has happened long before the capital letter sharp s was added
> to Unicode: The T1 encoding (aka Cork encoding) of LaTeX
> does this
On 15 May 2017, at 18:52, Asmus Freytag <asm...@ix.netcom.com> wrote:
>
> On 5/15/2017 8:37 AM, Alastair Houghton via Unicode wrote:
>> On 15 May 2017, at 11:21, Henri Sivonen via Unicode <unicode@unicode.org>
>> wrote:
>>> In reference to:
>>
On 18 May 2017, at 01:04, Philippe Verdy via Unicode
wrote:
>
> I find intriguating that the update intends to enforce the decoding of the
> **shortest** sequences, but now wants to treat **maximal sequences** as a
> single unit with arbitrary length. UTF-8 was designed
On 18 May 2017, at 06:01, Richard Wordingham via Unicode
wrote:
>
> On Thu, 18 May 2017 02:04:55 +0200
> Philippe Verdy via Unicode wrote:
>
>> I find intriguating that the update intends to enforce the decoding
>> of the **shortest** sequences, but
On 18 May 2017, at 07:18, Henri Sivonen via Unicode wrote:
>
> the decision complicates U+FFFD generation when validating UTF-8 by state
> machine.
It *really* doesn’t. Even if you’re hell bent on using a pure state machine
approach, you need to add maybe two additional
On 16 May 2017, at 14:23, Hans Åberg via Unicode wrote:
>
> You don't. You have a filename, which is a octet sequence of unknown
> encoding, and want to deal with it. Therefore, valid Unicode transformations
> of the filename may result in that is is not being reachable.
>
On 16 May 2017, at 16:44, Hans Åberg <haber...@telia.com> wrote:
>
> On 16 May 2017, at 17:30, Alastair Houghton via Unicode <unicode@unicode.org>
> wrote:
>>
>> HFS(+), NTFS and VFAT long filenames are all encoded in some variation on
>> UCS-2/UT
On 16 May 2017, at 17:07, Hans Åberg wrote:
>
HFS(+), NTFS and VFAT long filenames are all encoded in some variation on
UCS-2/UTF-16. ...
>>>
>>> The filesystem directory is using octet sequences and does not bother
>>> passing over an encoding, I am told.
> On 16 May 2017, at 20:43, Richard Wordingham via Unicode
> wrote:
>
> On Tue, 16 May 2017 11:36:39 -0700
> Markus Scherer via Unicode wrote:
>
>> Why do we care how we carve up an illegal sequence into subsequences?
>> Only for debugging and visual
On 16 May 2017, at 17:23, Hans Åberg wrote:
>
> HFS implements case insensitivity in a layer above the filesystem raw
> functions. So it is perfectly possible to have files that differ by case only
> in the same directory by using low level function calls. The Tenon MachTen
On 15 May 2017, at 11:21, Henri Sivonen via Unicode wrote:
>
> In reference to:
> http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
>
> I think Unicode should not adopt the proposed change.
Disagree. An over-long UTF-8 sequence is clearly a single error. Emitting
> On 16 May 2017, at 09:18, David Starner wrote:
>
> On Tue, May 16, 2017 at 12:42 AM Alastair Houghton
> wrote:
>> If you’re about to mutter something about security, consider this: security
>> code *should* refuse to compare strings that
> On 16 May 2017, at 10:29, David Starner wrote:
>
> On Tue, May 16, 2017 at 1:45 AM Alastair Houghton
> wrote:
> That’s true anyway; imagine the database holds raw bytes, that just happen to
> decode to U+FFFD. There might seem to be
On 16 May 2017, at 09:31, Henri Sivonen via Unicode wrote:
>
> On Tue, May 16, 2017 at 10:42 AM, Alastair Houghton
> wrote:
>> That would be true if the in-memory representation had any effect on what
>> we’re talking about, but it really
On 16 May 2017, at 08:22, Asmus Freytag via Unicode wrote:
> I therefore think that Henri has a point when he's concerned about tacit
> assumptions favoring one memory representation over another, but I think the
> way he raises this point is needlessly antagonistic.
That
On 15 May 2017, at 23:16, Shawn Steele via Unicode wrote:
>
> I’m not sure how the discussion of “which is better” relates to the
> discussion of ill-formed UTF-8 at all.
It doesn’t, which is a point I made in my original reply to Henry. The only
reason I answered his
On 15 May 2017, at 23:43, Richard Wordingham via Unicode
wrote:
>
> The problem with surrogates is inadequate testing. They're sufficiently
> rare for many users that it may be a long time before an error is
> discovered. It's not always obvious that code is designed for
> On 23 May 2017, at 18:45, Markus Scherer via Unicode
> wrote:
>
> On Tue, May 23, 2017 at 7:05 AM, Asmus Freytag via Unicode
> wrote:
>> So, if the proposal for Unicode really was more of a "feels right" and not a
>> "deviate at your peril"
On 23 May 2017, at 07:10, Jonathan Coxhead via Unicode <unicode@unicode.org>
wrote:
>
> On 18/05/2017 1:58 am, Alastair Houghton via Unicode wrote:
>> On 18 May 2017, at 07:18, Henri Sivonen via Unicode <unicode@unicode.org>
>> wrote:
>>
>>> th
On 16 May 2017, at 19:36, Markus Scherer wrote:
>
> Let me try to address some of the issues raised here.
Thanks for jumping in.
The one thing I wanted to ask about was the “without ever restricting trail
bytes to less than 80..BF”. I think that could be misinterpreted;
On 1 Jun 2017, at 19:44, Asmus Freytag via Unicode wrote:
>
> What's not OK is to take an existing recommendation and change it to
> something else, just to make bug reports go away for one implementations.
> That's like two sleepers fighting over a blanket that's too
On 31 May 2017, at 20:24, Shawn Steele via Unicode wrote:
>
> > For implementations that emit FFFD while handling text conversion and
> > repair (ie, converting ill-formed
> > UTF-8 to well-formed), it is best for interoperability if they get the same
> > results, so that
On 31 May 2017, at 20:42, Shawn Steele via Unicode wrote:
>
>> And *that* is what the specification says. The whole problem here is that
>> someone elevated
>> one choice to the status of “best practice”, and it’s a choice that some of
>> us don’t think *should*
>> be
On 31 May 2017, at 18:43, Shawn Steele via Unicode wrote:
>
> It is unclear to me what the expected behavior would be for this corruption
> if, for example, there were merely a half dozen 0x80 in the middle of ASCII
> text? Is that garbage a single "character"? Perhaps
> On 30 May 2017, at 18:11, Shawn Steele via Unicode
> wrote:
>
>> Which is to completely reverse the current recommendation in Unicode 9.0.
>> While I agree that this might help you fending off a bug report, it would
>> create chances for bug reports for Ruby, Python3,
On 1 Jun 2017, at 10:32, Henri Sivonen via Unicode wrote:
>
> On Wed, May 31, 2017 at 10:42 PM, Shawn Steele via Unicode
> wrote:
>> * As far as I can tell, there are two (maybe three) sane approaches to this
>> problem:
>>* Either a "maximal"
On 1 May 2017, at 15:19, Naena Guru via Unicode wrote:
>
> This whole attempt to make digitizing Indic script some esoteric, 'abstract',
> 'semantic representation' and so on seems to me is an attempt to make Unicode
> the realm of the some super humans.
No. It’s
On 5 Jun 2018, at 07:09, Martin J. Dürst via Unicode
wrote:
>
> Hello Rebecca,
>
> On 2018/06/05 12:43, Rebecca T via Unicode wrote:
>
>> Something I’d love to see is translated keywords; shouldn’t be hard with a
>> line in the cargo.toml for a ruidmentary lookup. Again, I’m of the opinion
>>
On 4 Jun 2018, at 20:49, Manish Goregaokar via Unicode
wrote:
>
> The Rust community is considering adding non-ascii identifiers, which follow
> UAX #31 (XID_Start XID_Continue*, with tweaks). The proposal also asks for
> identifiers to be treated as equivalent under NFKC.
>
> Are there any
On 7 Jun 2018, at 15:51, Frédéric Grosshans via Unicode
wrote:
>
>> IMO the major issue with non-ASCII identifiers is not a technical one, but
>> rather that it runs the risk of fragmenting the developer community.
>> Everyone can *type* ASCII and everyone can read Latin characters (for
>>
On 6 Jun 2018, at 17:50, Manish Goregaokar wrote:
>
> I think the recommendation to use ASCII as much as possible is implicit there.
It would be a very good idea to make it explicit. Even for English speakers,
there may be a temptation to use characters that are hard to distinguish or
hard to
On 30 Jan 2018, at 05:31, Marcel Schneider via Unicode
wrote:
>
> OnMon, 29 Jan 2018 11:13:21 -0700, Tom Gewecke wrote:
>>
>>> On Jan 29, 2018, at 4:26 AM, Marcel Schneider via Unicode wrote:
>>>
>>>
>>> the Windows US-Intl
>>> does not allow to write French in a
On 14 Feb 2018, at 16:29, Shriramana Sharma via Unicode
wrote:
>
> Sorry but "UNICODE" does fit within those rules doesn't it?
Yes. Stephane has misunderstood. (Shriramana meant the literal text
“UNICODE”, which is indeed composed of letters A-Z and meets the definition
On 14 Feb 2018, at 13:25, Shriramana Sharma via Unicode
wrote:
>
> From a mail which I had sent to two other Unicode contributors just a
> few days ago:
>
> Frankly I agree that this whole emoji thing is a Pandora box. It
> should have been restricted to emoticons to
On 11 Mar 2018, at 21:14, Marcel Schneider via Unicode
wrote:
>
> Indeed, to be fair. And for implementers, documenting themselves in English
> may scarcely ever have much of a problem, no matter whatʼs the locale.
Agreed. Implementers will already understand English;
36 matches
Mail list logo