On 21 Mar 2017, at 20:49, Quincey Morris <quinceymor...@rivergatesoftware.com> 
wrote:
> 
> On Mar 20, 2017, at 14:23 , davel...@mac.com wrote:
>> 
>> "iOS HFS Normalized UNICODE names , APFS now treats all file[ name]s as a 
>> bag of bytes on iOS . We are requesting that Applications developers call 
>> the correct Normalization routines to make sure the file name contains the 
>> correct representation."
> 
> I’ve been letting this simmer for a couple of days now, and I’ve come to the 
> conclusion that it’s — sincere apologies to the unnamed Apple engineer who 
> wrote it — as dumb as dirt.
> 
> — It’s not a "bag of bytes”, because bags of stuff are generally understood 
> as unordered sets, and I doubt that’s what’s intended. It has to be a 
> sequence of bytes.

In the context of filesystems (and specifically filenames), the phrases “bag of 
bytes” and “bunch of bytes” have a fairly specific meaning.  The point is that 
the filesystem doesn’t inspect the bytes it’s given, and doesn’t care what they 
represent (about the only exception is that it probably doesn’t support 
embedded NULs).  It isn’t suggesting that the names are treated as an unordered 
set of bytes (that’d just be silly).  It’s just expressing the fact that the 
filesystem doesn’t care what they are - it may compare them, and if it does so, 
it will use binary ordering (not some other collation sequence) and won’t worry 
about things like case or encoding at all.

> — It’s not just a string, it has to be a string in a known encoding. 
> Otherwise, how could you ever mount an external drive on a different 
> computer? The encoding has to be pre-specified for APFS, or it has to be 
> stored in metadata on each volume.

Agreed, that’s where the “bunch of bytes” approach falls down.

> — It’s not just going to be a string of known encoding, it’s going to be 
> Unicode. That’s going to be true even if the fact is specified in volume 
> metadata and it’s theoretically possible to create APFS volumes with 
> non-Unicode file names. Anything other than Unicode would, at this point, be 
> a crime against humanity.

If I’d designed APFS, it probably would use Unicode names (and it’d store the 
version of Unicode it used in the filesystem header, to avoid having to 
hard-code it).

But I didn’t design it - Dominic Giampaolo and his team did - and we still 
don’t have that much information about how APFS works.  I’m sure they had their 
reasons for whatever decision they’ve made here.

> Is *that* the bottom line? I doubt it. I don’t believe the above quoted 
> statement can be correct. I could believe that normalization is being moved 
> out of the file system code, but it would have to be moved to (e.g.) the 
> Cocoa frameworks, still “downstream” of the file-handling APIs. It can’t go 
> upstream of the public APIs without breaking an API contract that has existed 
> for the 16+ years since OS X 10.0.

This is a tricky area.  The problem with what we have at the moment 
(-fileSystemRepresentation) is that it *assumes* HFS+ semantics.  That isn’t 
always going to be correct for existing non-HFS+ filesystems, let alone in the 
future.  Of course, if you’re using the NSURL or NSString methods, rather than 
calling the BSD or C library APIs yourself, this is all hidden from you anyway 
(you certainly shouldn’t, IMO, be required to do anything unusual at Cocoa 
level - the Foundation framework should just make this all work, rather in the 
same way it presently does for numerous other things).

It’s also complicated by the fact that, unlike on DOS or Windows, UNIX-like 
systems use a unified filesystem - that is, other filesystems are joined on at 
mount points.  Thus you could have a name like

  /Volumes/Foo/Bar/Baz/Blam

where (say) both Foo and Baz are mount points, and the rules about filenames 
could differ markedly, at least in principle; that is, /Volumes/Foo would have 
to conform to HFS+ (or APFS) rules, Bar/Baz to whatever rules govern the 
filesystem mounted at Foo, and Blam to whatever rules govern the filesystem 
mounted at Baz.  And remember, not every filesystem will be using a well known 
encoding - macOS already has code to add and remove percent escapes (I kid you 
not) for this very reason.

I’d like to hear what Dominic has to say (at least what he *can* say) about 
this, since he’s likely in a position to shed some light on it - or at least to 
take on board that we’re worrying about it.  At the very least it’d be nice to 
see some more detail about APFS published somewhere *soon*...

Kind regards,

Alastair.

--
http://alastairs-place.net


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to