Hi Jiangang,

On Wed, 11 Dec 2013, Duan, Jiangang wrote:
> Sage,
> 
> I have some questions regarding to the key/value backend work.
> 
> What is the motivation to work on this? (or what is the problem we want to 
> solve?)
> 1) to use the new interface thus we can bypass all the OS layer thus get a 
> short latency?

That is one part.  The current strategy of layering on top of a file 
system and using a write-ahead journal makes sense given the existing 
linux fs building blocks, but is far from an optimal solution for many 
workloads.  A k/v interface based on something leveldb probably performs 
much better for many small-object use-cases.  Also, a k/v backend can take 
advatange of emerging non-block storage interfaces like NVMKV, Kinetic, 
new libraries like rocksdb, etc.

> 2) or to leverage some new primitive e.g. the atomic write thus to simplify 
> the code writing?

That too.  Basically, we are currently doing a lot of work to get what we 
need out of posix, and are paying the price.

> There are several different possibilities to use future NVM technology - 
> NVM.FILE, NVM.BLOCK, PM.XXX 
> http://snia.org/sites/default/files/NVMProgrammingModel_v1r10DRAFT.pdf 
> Even for openNVM thing - there are other usage model than k/v.
> 
> Do you have any typical usage model for this? 

I wasn't familiar with these; thanks for the reference!  Of these, 
NVM.FILE seems the most interesting (it maps most closely to an object).  
I am predisposed to skepticism when it comes to these sorts of 
standards/API docs that precede an actual implementation, but it is 
encourgaging to see some effort here towards a common interface.

In the end, we want to support generic Ceph workloads.  These range from 
rbd block and file type workloads (objects are stripes of files, with 
random bytes rewritten) to omap type workloads (like rgw bucket indices 
that are purely key/value).

I think the first wins would be:

1- a backend that more efficiently handles rgw bucket index workloads
2- a backend that is more efficient for rgw in general (i.e., immutable 
objects)
3- a backend that can handle more general purpose workloads (like rbd and 
cephfs)

and separately,

4- a backend that lets you plug in a next-gen backend beneath it, like 
NVMKV and speedy flash.

sage



> 
> -jiangang
> 
> ===================
> 
> From: Sage Weil <sage <at> inktank.com>
> Subject: new ceph-osd key/value backend
> Newsgroups: gmane.comp.file-systems.ceph.devel
> Date: 2013-11-09 10:09:52 GMT (4 weeks, 3 days, 16 hours and 39 minutes ago)
> I've written up a blueprint with a rough sketch of how to take advantage 
> of alternative storage interfaces.  I am very happy to see that several f 
> them have emerged over the past year or two:
> 
>  - fusionio's KVMKV is a key/value interface for their flash products
>  - seagate's kinetic is a key/value interface for their new ethernet-based 
> drive
> 
> Also, leveldb is pretty great for many workloads when run on a 
> tranditional disk/fs.
> 
> The good news is a lot of the existing work that went into support omap 
> looks to be reusable here.  Some new functionality and refactoring is 
> needed, though, particularly when it comes to storing object data (the 
> file-like bag of bytes portion) as key/value pairs.
> 
> The blueprint is here:
> 
>   
> http://wiki.ceph.com/01Planning/02Blueprints/Firefly/osd%3A_new_key%2F%2Fvalue_backend
> 
> N?????r??y??????X???v???)?{.n?????z?]z????ay?????j??f???h??????w???
> ???j:+v???w????????????zZ+???????j"????i

Reply via email to