On Oct 22, 2007, at 6:16 PM, Amit Singh wrote:

>
> On Oct 22, 5:26 pm, Ron Aldrich <[EMAIL PROTECTED]> wrote:
>
>> Since the backing store for the FS is an actual /dev/disk, I'd like
>> it to show the volume icon provided by the device driver, in the same
>> way that the HFS file system does.  Implementing the destroy method
>> in the FS to eject the physical disc should work, but I know that the
>> association between a file system and it's device vnode is normally
>> maintained by the OS, and I'd like to come up with a way to use the
>> existing mechanism, if possible.
>
> Well, "normally" the file system isn't provided by a user-space
> program! The entire point of MacFUSE is to let a *user-space* program
> take care of the backing storage. Whatever existing mechanism(s)
> you're probably thinking of are not applicable here.
>
> As for the icon, I'd reiterate my suggestion: please look at the
> 'volicon' option. It's documented in the MacFUSE wiki.

That's probably the simplest method to use for now - I can look in  
the IOKit registry for the appropriate icon, and forward it to MacFuse.

>
>> I guess I have some research to do in order to determine how that
>> mechanism currently works.
>
> Yes. Also note that what applies to Linux (or Linux FUSE) doesn't
> necessarily apply to Mac OS X or MacFUSE. The user-space library
> (libfuse) is a port of the Linux version, but the MacFUSE kernel
> extension is completely different from the Linux version, so don't
> expect nuances, caveats, features, etc. to be identical across the
> two.
>
>>>> 2) The volume cannot be renamed.  Is there a way to support this?
>
>> Looks like I'm going to be needing to add that, then.  How adverse
>> are you to adding Mac specific APIs?
>
> I'm not keen on extending the API just yet. I have added numerous Mac
> specific things in the past, but volume renaming will have to wait for
> now, since I have some higher priority things to take care of. As you
> probably realize, it's not just a matter of adding a field to a
> structure or adding a new function: if things have to be done in a
> foolproof, backwards compatible manner, many kernel/user components
> will have to be extended (and volume renaming isn't the only thing I
> want to add--there are other things too). It doesn't sound like this
> is a need of crisis proportions for you--if that's the case, I'd
> advise waiting a bit on this one.

While volume renaming isn't the most important feature on my plate,   
I guarantee that my customers will expect it.  At any rate though, it  
can wait until the rest of the FS is stable.

It seems as though it might be worthwhile to develop a way to add  
features to the MacFuse API independently of the Fuse  API.   
Certainly, you wouldn't want to simply add an additional method to  
the fuse_operations structure, as that would conflict with additions  
made by the Fuse developers.

Perhaps a secondary initializer which was intended to be called  
before fuse_main would work well

i.e.

macfuse_init(macfuseOperations__, NULL).

macfuseOperations__ would carry mac specific, optional methods.

>
>> I was hoping that bmap might be a way to achieve some support for the
>> unified buffer cache, at a per file level.  Currently the best that
>> can be hoped for is that the UBC will associate all of the files on
>> the volume with a single vnode (the device's vnode).
>>
>> MacOS (as of 10.4) uses the "blockmap" function to allow files to be
>> memory mapped, and to support cluster mapped I/O.
>>
>> Within the blockmap function, a file system describes a file's
>> extents in terms of blocks, allowing the operating system to bypass
>> the "read" and "write" codepaths within the file system, and perform
>> the I/O itself using the unified buffer cache.  Of course, this only
>> works if the file's contents are stored on a block device, and are
>> block aligned.
>>
>> At any rate, I don't think that bmap is a good fit, as it seems to
>> have been written to provide a mapping between a single block in the
>> file and a single block on disk, rather than an extent within a file,
>> and an extent on disk.
>
> Are you expecting that MacFUSE simply pushes the entire *vfs*
> interface to user space? If so, that's not how it works! MacFUSE
> itself uses cluster I/O, but it doesn't just forward every vnode/vfs/
> cluster I/O operation to user space. That's not its goal--it's goal is
> to provide a vastly simpler file system programming interface (than
> the vfs, say). The data that you provide from your user-space program
> to MacFUSE /does/ go through the UBC. In your user-space file system,
> you'll simply get read/write calls for file data. What exactly are you
> looking to do?

Just so you know where I'm coming from, my starting point is a  
platform independent UDF file system engine (UDFAPI) , which has  
already been ported to several different operating systems.

I have this UDFAPI working (for the most part) within a kernel only  
filesystem module, so I'm trying to replicate (as closely as  
possible) the functionality of that filesystem.  (I'm porting it to  
MacFuse, because I need to add a file based cache to its I/O, in  
order to support devices which do not support random access writes).

The I/O model of UDFAPI is rather plain - in order to read from a  
file, you provide a buffer pointer, size and offset.  This model is  
not a good match for UBC, which wants to create a buffer for each  
individual block.

The read function in the kernel only FS looks like this - the  
important part is that call to cluster_read.

> int udf_file_node::read ( struct vnop_read_args* ap )
> {
>   unix_error result("udf_file_node::read", SHOW_TESTED,  
> NODE_PTR_AND_NAME(this));
>
>   uio* the_uio = ap->a_uio;
>
>   if (file_is_cluster_mappable_)
>   {
>     off_t the_file_size;
>
>     RETURN_IF_UNIX_ERROR(this->get_data_size(&the_file_size));
>
>     LOG_UNIX_ERROR(cluster_read(vnode_ptr_,
>                                 the_uio,
>                                 (off_t)the_file_size,
>                                 0));
>
>     return result;
>   }
>   else
>   {
>     // int  the_ioflag = ap->a_ioflag;
>     // ucred* the_cred = ap->a_cred;
>
>     // !!! Check Credentials.
>
>     // !!! If the I/O is to kernel memory, and there is only one  
> buffer in the
>     //                        uio, we can optimize away the kernel buffer.
>
>     // Create a kernel buffer for the read.
>
>     void* the_buffer = LOG_KERNEL_MALLOC(uio_resid(the_uio));
>     malloc_janitor the_buffer_janitor(the_buffer);
>     if (the_buffer == NULL)
>       return LOG_UNIX_ERROR(ENOMEM);
>
>     // Call udfapi to read the file into kernel memory
>
>     udfUINT64 the_offset;
>     //the_offset.low = (udfUINT32) the_uio->uio_offset;
>     //the_offset.high = (udfUINT32) (the_uio->uio_offset >> 32);
>     the_offset.low = (udfUINT32) uio_offset(the_uio);
>     the_offset.high = (udfUINT32) (uio_offset(the_uio) >> 32);
>
>     udfUINT32 the_act_bytes = 0;
>
>     WITH_UDFFILENODE (the_UDFFileNode, this);
>     RETURN_IF_UNIX_ERROR(the_UDFFileNode.get_error());
>
>     udfERR the_udf_error = udfapi__read_file 
> (the_UDFFileNode.get_UDFFileNode(),
>                                              the_offset,
>                                              (udfBYTE*)the_buffer,
>                                              //the_uio->uio_resid,
>                                              uio_resid(the_uio),
>                                              &the_act_bytes);
>
>     if ((the_udf_error != ERR_NO_ERROR) && (the_udf_error != ERR_EOF))
>       return LOG_UDF_ERROR(the_udf_error);
>
>     // call uiomove to 'copy' the data to user memory.
>
>     LOG_UNIX_ERROR(uiomove((char*)the_buffer, the_act_bytes,  
> the_uio));
>
>     SAI_LOG(SHOW_TESTED, "  uio_offset -> 0x%08lX %08lX\n",
>           //(UInt32) (the_uio->uio_offset >> 32),
>           //(UInt32) the_uio->uio_offset);
>           (UInt32) (uio_offset(the_uio) >> 32),
>           (UInt32) uio_offset(the_uio));
>     SAI_LOG(SHOW_TESTED, "  uio_resid  -> %08lX", (UInt32)uio_resid 
> (the_uio));
>
>     return result;
>   }
> }
>

cluster_read is a KPI function which performs the read directly,  
providing efficient support for UBC - in my case bypassing  
udfapi__read_file, and all of it's associated cruft  
(LOG_KERNEL_MALLOC, uiomove, etc...).

In order for cluster_read to work properly, however, a function is  
needed to provide the mapping between extents within a file, and  
extents on disk.  This is provided by implementing the vnode's  
blockmap function.

> int udf_file_node::blockmap ( struct vnop_blockmap_args* ap )
>
> /*
>  *    Parameters:
>  *    vnode_t                         ap->a_vp                        input - 
>         Pointer to the file's vnode
>  *    off_t                                   ap->a_foffset   input -         
> Contiguous run's byte offset  
> within the file.
>  *    size_t                          ap->a_size              input -         
> Maximum size of run that we're  
> interested in.
>  *    daddr64_t*              ap->a_bpn                       output -        
> Block number where the extent  
> begins
>  *    size_t*                         ap->a_run                       output 
> -        Size ( in bytes ) of the extent.
>  *    void*                                   ap->a_poff              output 
> -        offset into physical block (Set  
> to 0).
>  *    int                                             ap->a_flags             
> input  -  flags
>  *    vfs_context_t ap->a_context     input  -  context
>  */

This also assumes that the kernel filesystem has a vnode through  
which it can directly access the disk, which might not be possible  
given that the disk must be opened for writing by the userland  
application.

It may be that none of this is actually necessary, or desired with  
MacFuse - given that you're already supporting the UBC for file I/O.

The main concern, then, would be avoiding actual memory copy  
operations when getting the data from the userland application to the  
UBC - and given that uiomove is written to avoid copying memory  
whenever possible (uiomove is some very cool code), that has probably  
already been addressed as well.


>> As I recall, there is some connection between resource forks and
>> extended attributes under 10.5?  Does that imply that the size
>> parameter in setxattr can be very large?
>
> Attributes such as "com.apple.ResourceFork" and "com.apple.FinderInfo"
> are mapped to resource forks and Finder info. There are others (for
> ACLs and such). Since xattrs can't be seeked into (like files), the
> size can't be arbitrarily large. I'm still toying with the size and
> when I release the next major version of MacFUSE (very soon), I might
> go with something like 4MB for the size limit. That is, you'd be
> limited to 4MB per xattr. This is much larger than the limit HFS+ has
> (~ 4KB), but HFS+ resource forks don't go through this path, so they
> can be arbitrarily large. If you want larger than 4MB resource forks
> in MacFUSE, you'll have to resort to using AppleDouble files for them.
>

It sounds like I will need to implement resource forks using virtual  
AppleDouble files then - at least at some point.  UDF supports the  
concept of "named forks", one of which is intended to be used for HFS  
style resource forks.  I'd guess my best bet will be to emulate  
AppleDouble files using named forks.

Assuming that I will be implementing xattrs as well, should I  
generate an error when com.apple.ResourceFork is encountered?  Is  
there a flag somewhere that indicates that I prefer the AppleDouble  
interface?  Or, does the presence of a ._.* file in the directory  
preclude use of the com.apple.ResourceFork xattr?

Thanks again for your time,

Ron Aldrich
Software Architects, Inc.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"macfuse-devel" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/macfuse-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to