Hi Amit / the group,

All in all, very useful information. Thanks a lot for explaining.
I was aware of the MAXNAMLEN limitation for the system call 
getdirentries, but I thought this was an API limitation, and that the 
file name mangling somehow happened automatically behind the curtains to 
satisfy the limitations of the APIs. I see now that it's not that simple.

One question though: The getattrlist(2) manpage states that when 
retrieving ATTR_CMN_NAME, the NAME_MAX limitation also applies (quote: 
"The attribute data length will not be greater than NAME_MAX + 1").
If this is true, then this can't be the API that Finder uses to get the 
full file name? Or is the documentation incorrect?
I assume that it is not possible to explicitly supply the ATTR_CMN_NAME 
attribute with the current version of MacFUSE?

I did note during testing, as you pointed out, that I can lookup and 
perform open, read, write, ... operations using the unmangled form of 
the really long file names in HFS+ through BSD apis... so I support the 
idea that MacFUSE should not automatically fail lookup of file names 
that exceed MAXNAMLEN... though if this could somehow affect the 
operation of other existing FUSE file systems (can't immediately see 
how..), then maybe it could be controlled using a mount time option.

- Erik

Amit Singh wrote:
> The short answer is that this is something that the user-space file
> system would have to "take care of" itself.
>
> There's a fundamental limitation on file name length in the Mac OS X
> kernel's BSD layer. Look at getdirentries(2). The dirent structure has
> a 255 *byte* limit on the file name length. This limit can be seen
> codified as __DARWIN_MAXNAMLEN and MAXNAMLEN in <sys/dirent.h>, and
> also as NAME_MAX in <sys/syslimits.h>. What this means is that the
> kernel-level readdir() implementation of a file system (any file
> system--not just MacFUSE) _must_ return no more than 255 bytes for a
> file name.
>
> Now, as some might point out, HFS+ supports file names that are up to
> 255 *Unicode characters* long. (HFS+ uses an array of 255 uint16_t's
> for this purpose. Together with a uint16_t length field, the on-disk
> byte-space used by an HFS+ file name is up to 255*2 + 2 = 512 bytes.
> The in-flight UTF-8 length can be still longer.) So how do HFS+ file
> names that are longer than 255 bytes work? Well, when HFS+ encounters
> such file names, it mangles them before feeding them to getdirentries
> (). The mangling scheme limits the name to 255 bytes by replacing the
> trailing part of the name with a hexadecimal representation of the
> file's Catalog Node ID. However, the scheme preserves any file
> extension, so it's not always the truly trailing part that it
> replaces. Of course, such a mangling scheme also needs to ensure that
> it returns valid UTF-8 (that is, it cuts off the original name at an
> appropriate character.)
>
> To see this scheme in action, create a really long file name and look
> at it in the Terminal (that is, through the BSD layer). For an
> original name like "中中中中...中中中.txt" you'll see something like "中中中...中中
> 中#9C0282.txt", where the number of "中"s is smaller in the mangled
> name.
>
> To make the whole thing work, the lookup code in HFS+ supports looking
> up mangled names if a file name indeed happens to seem like a mangled
> name and the original lookup fails. This isn't too hard in HFS+
> because Catalog search through the node ID is normal. The handlers for
> rename() etc. also need to take this into account. All in all, there's
> a whole bunch of extra stuff that HFS+ does for this.
>
> The limitation aside, you can actually get the complete, non-truncated
> name of such a file from a user-space program. (The Finder can show
> you the full name, for example.) For that, you'll have to go through
> the getattrlist(2) interface and retrieve the ATTR_CMN_NAME attribute.
>
> OK, so what about MacFUSE? Well, as I've said before, MacFUSE doesn't
> know, nor wants to know, any encoding at the kernel level. All file/
> folder names fed to MacFUSE must be no longer than 255 bytes. If you
> return a longer name in readdir(), MacFUSE will EIO, which will fail
> that readdir call. MacFUSE also fails a lookup if the requested name
> is longer than 255 bytes. (This latter behavior isn't strictly
> necessary actually. I could consider removing this check in the
> future, which will make calls other than readdir() work like they do
> with HFS+.)
>
> So, the gist is that the user-space file system needs to do its own
> mangling and return names that are limited to 255 bytes. Even if
> MacFUSE could automagically somehow do mangling, it would still be
> insufficient. That's because when given a mangled name by a program,
> MacFUSE would still need to ask the user-space file system to look it
> up. It's note quite the same situation with HFS+ because unlike
> MacFUSE, the kernel implementation of HFS+ knows "all" about the file
> system volume in question and its contents.
>
> As for fuse_fill_dir_t... it's not encountering any problem. It's not
> the one that's bailing out on a > 255 byte file name. As a file system
> writer (user-space or otherwise), there will be times when you just
> need to "know" some caveats and limitations.
>
> Amit
>
> On Jan 21, 8:26 pm, Erik Larsson <[email protected]> wrote:
>   
>> Hi,
>>
>> I've been running into an annoying problem with really long file names
>> returned by NTFS-3G. It seems that as soon as the length of any file
>> name string fed to fuse_fill_dir_t during a readdir operation exceeds
>> 255 bytes, the entire directory becomes unreadable (no entries show up
>> at all when trying to list it). The call to fuse_fill_dir_t still
>> returns 0, so from the implementor's point of view, no problem can be
>> detected.
>> You should be able to reproduce this easily by modifying the hello_path
>> variable in hellofs to get a sizeable file name.
>>
>> Maybe this is an example of undefined behaviour, and maybe 255 bytes is
>> some kind of limit for what the MacFUSE/FUSE will handle, but it does
>> mean that a MacFUSE file system is not able to produce as long file
>> names as for instance the HFS+ driver and the built in NTFS driver can
>> produce.
>>
>> An example: Consider a file name with 255 chinese characters (say 255
>> repetitions of 中, zhong). This fits into both the NTFS and HFS+
>> structures which use 255 UTF-16 units internally for file name storage.
>> While the HFS+ driver allows the use of such a file name, NTFS-3G can't
>> do it since the file name expands to 765 bytes when encoded to UTF-8,
>> which makes the readdir operation fail as stated above.
>>
>> Is there a solution to this, or is it a known limitation?
>>
>> In the hypothetical worst case, a UTF-8 encoded file name returned by
>> NTFS-3G may be as large as 2295 bytes after NFD normalization (255
>> korean characters decomposed into 3 jamos, each taking up 3 bytes in
>> UTF-8 form), so I've been running into this problem not only with test
>> cases but with korean file names with as little as 25-30 characters.
>>
>> In any case, I don't think the fuse_fill_dir_t behaves correctly... if
>> it encounters any problem when fed a filename that is too large, there
>> should be a way of detecting failure.
>>
>> Regards,
>>
>> - Erik
>>     
> >
>   


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"MacFUSE" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/macfuse?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to