Re: [ceph-users] Note about rbd_aio_write usage

Piotr Dałek Thu, 06 Jul 2017 08:50:53 -0700

On 17-07-06 04:40 PM, Jason Dillaman wrote:

On Thu, Jul 6, 2017 at 10:22 AM, Piotr Dałek <piotr.da...@corp.ovh.com> wrote:

So I really see two problems here: lack of API docs and
backwards-incompatible change in API behavior.


Docs are always in need of update, so any pull requests would be
greatly appreciated.

However, I disagree that the behavior has substantively changed -- it
was always possible for pre-Luminous to (sometimes) copy the buffer
before the "rbd_aio_write" method completed.

But that copy was buried somewhere deep in the librbd internals and -looking at Jewel version - most would assume that it's not really copied anduser is responsible for keeping buffer intact until write is complete. APIuser doesn't really care about what's going on internally and is beyondtheir control.

With Luminous, this
behavior is more consistent -- but in a future release memory may be
zero-copied. If your application can properly conform to the
(unwritten) contract that the buffers should remain unchanged, there
would be no need for the application to pre-copy the buffers.

So far I am forced to do a copy anyway (see below). The question is whetherit's me doing it, or librbd. It doesn't make sense to have it both do thesame -- especially if it's going to handle tens of terabytes of data, whichcould mean for 10TB of data at least 83 886 080 memory allocations, releasesand copies plus 2 684 354 560 page faults (assuming 4KB pages) -- and theseare the best case scenario numbers assuming 128KB I/O size. What Iunderstand that you expect from me, is to have at least number of memorycopies doubled and push not "just" 20TB over the memory bus (reading 10TBfrom one buffer and writing these 10TB to another), but 40.In other words, if I'd write my code considering how Jewel librbd works,there would be no real issue, apart from the fact that suddenly my programwould consume more memory and would burn more CPU cycles once librbd isupgraded to Luminous which, considering the amount of data, would benoticeable change.

If the libfuse implementation requires that the memory is not-in-use
by the time you return control to it (i.e. it's a synchronous API and
you are using async methods), you will always need to copy it.

Yes, libfuse expects that once I leave entrypoint, it is free to do anythingit wishes with previously provided buffers -- and that's what it actually does.


> The C++
> API allows you to control the copying since you need to pass
> "bufferlist"s to the API methods and since they utilize a reference
> counter, there is no internal copying within librbd / librados.

How about a hybrid solution? Keep the old rbd_aio_write contract (don't copythe buffer with the assumption that it won't change) and instead ofconstructing bufferlist containing bufferptr to copied data, construct abufferlist containing bufferptr made with create_static(user_buffer)?



--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Note about rbd_aio_write usage

Reply via email to