Re: libcephfs create file with layout and replication

Josh Durgin Sat, 17 Nov 2012 13:35:20 -0800

On 11/17/2012 12:13 PM, Noah Watkins wrote:

The Hadoop VFS layer assumes that block size and replication can be
set on a per-file basis, which is important to users for file
layout/workload optimizations.


The libcephfs interface doesn't make this entirely easy. Here is one
approach, but it isn't thread safe as the default values are global
variables in the client.

   orig_obj_size = ceph_get_default_object_size() //save
   set_default_object_size(new size)
   open(path, O_CREAT)
   set_default_object_size(new size) //reset

Something more convenient might be:

   ceph_open_layout(path, flags, mode, layout, replication)


I think this makes the most sense, since changing the layout of a
file after it's been created can't happen, and this interface
makes that the most clear. It also avoids maintaining extra state
in libcephfs between calls.

Since replication count is a per-pool setting, I think the hadoop
bindings would have to translate from a vfs request to a pool
with the requested replication level. So something like this,
where layout is a struct containing stripe unit, stripe count,
and object size (the subset of struct ceph_file_layout related to
objects that's useful currently):

    ceph_open_layout(path, flags, mode, layout, pool_name)

BTW, for anyone interested, there's a nice description of
the layout parameters here:

http://ceph.com/docs/master/dev/file-striping/

where layout and replication are used with O_CREAT | O_EXCL, or and
interface for setting these values explicitly on newly created files:

   ceph_open(path, O_CREAT|O_EXCL)
   ceph_set_layout(path, layout, replication)

where ceph_set_layout would succeed ostensibly on zero-length files.

Any thoughts on how to handle this?

Thanks,
Noah


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: libcephfs create file with layout and replication

Reply via email to