Re: Unicode filenames with Apple File System and UIManagedDocument

Ed Wynne Thu, 23 Mar 2017 10:59:47 -0700

> On Mar 23, 2017, at 1:40 PM, Charles Srstka <cocoa...@charlessoft.com> wrote:
> 
>> On Mar 23, 2017, at 3:50 AM, Alastair Houghton 
>> <alast...@alastairs-place.net> wrote:
>> 
>> On 22 Mar 2017, at 19:13, Chris Ridd <chrisr...@mac.com 
>> <mailto:chrisr...@mac.com>> wrote:
>>> 
>>>> On 22 Mar 2017, at 09:05, Alastair Houghton <alast...@alastairs-place.net 
>>>> <mailto:alast...@alastairs-place.net>> wrote:
>>>> 
>>>> In the context of filesystems (and specifically filenames), the phrases 
>>>> “bag of bytes” and “bunch of bytes” have a fairly specific meaning.  The 
>>>> point is that the filesystem doesn’t inspect the bytes it’s given, and 
>>>> doesn’t care what they represent (about the only exception is that it 
>>>> probably doesn’t support embedded NULs).  It isn’t suggesting that the 
>>>> names are treated as an unordered set of bytes (that’d just be silly).  
>>>> It’s just expressing the fact that the filesystem doesn’t care what they 
>>>> are - it may compare them, and if it does so, it will use binary ordering 
>>>> (not some other collation sequence) and won’t worry about things like case 
>>>> or encoding at all.
>>> 
>>> That doesn’t sound sensible at all. It means you can create a filename with 
>>> a byte sequence that isn’t valid UTF-8 and which likely then cannot be 
>>> accessed by MacOS/iOS processes.
>> 
>> That isn’t possible on macOS - there’s a percent escaping mechanism built in 
>> to the kernel to prevent this problem.
>> 
>>> It means that you could create multiple files with the “same" name, and 
>>> that doesn’t sound like a win either. e.g. Aandi’s examples of LATIN SMALL 
>>> LETTER E (U+0065)
>>> COMBINING ACUTE ACCENT (U+0301) and LATIN SMALL LETTER E WITH ACUTE (U+00E9)
>> 
>> Yes, it does.
>> 
>>> How can a “next gen” filesystem avoid using Unicode rules when handling 
>>> filenames?
>> 
>> Well, if I had designed it, it wouldn’t.  But I didn’t.
>> 
>> To be fair, I can see arguments in favour of the bunch of bytes approach; 
>> the existing approach has created a problem in HFS+, in that the 
>> normalisation is essentially fixed for all time, and doesn’t correspond to 
>> the current version of Unicode.  It’s actually worse than it might be, 
>> because (IIRC) they fixed the normalisation *before* Unicode adopted a 
>> stability policy for normalisation...
>> 
>> But if the filesystem (or kernel) isn’t doing it, then IMO the Cocoa 
>> frameworks certainly should.
> 
> Shouldn’t the VFS layer actually be doing this? It is part of its whole 
> raison d’être, no? Just have -[NSURL fileSystemRepresentation] normalize 
> things according to the correct Unicode rules, and let the VFS layer 
> translate that to HFS+’s normalization style when dealing with HFS+.



Yes, this.

Having the conversion only available up in the Cocoa layer is an incredibly 
poor choice. It effectively means nothing at the BSD layer will be able to 
properly normalize file names. Having it at the VFS layer is the most sane 
option, even with the problems that causes.

-Ed


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Unicode filenames with Apple File System and UIManagedDocument

Reply via email to