On Mon, Oct 17, 2016 at 3:34 AM, James Norman <ja...@storagemadeeasy.com> wrote:
> Hi Gregory,
> Many thanks for your reply. I couldn't spot any resources that describe/show
> how you can successfully write / append to an EC pool with the librados API
> on those links. Do you know of any such examples or resources? Or is it just
> simply not possible?
If it's not in there I guess it's all "spoken" knowledge and you'll
have to dig through ceph-devel archives (probably for emails from
Sam). I'm not on the RADOS team, but the concept you need:
*) objects in EC pools can only be appended or truncated+recreated
*) because otherwise you'd need round-trip read-modify-write operations
*) so all operations must be in the block size you specify (or maybe
it's implicit based on stripe size and EC n count?) at pool create
*) including the appends.
I'm afraid that's about all the info I've got on it though.
> Best regards,
> James Norman
> On 6 Oct 2016, at 19:17, Gregory Farnum <gfar...@redhat.com> wrote:
> On Thu, Oct 6, 2016 at 4:08 AM, James Norman <ja...@storagemadeeasy.com>
> Hi there,
> I am developing a web application that supports browsing, uploading,
> downloading, moving files in Ceph Rados pool. Internally to write objects we
> use rados_append, as it's often too memory intensive for us to have the full
> file in memory to do a rados_write_full.
> We do not control our customer's Ceph installations, such as whether they
> use replicated pools, EC pools etc. We've found that when dealing with a EC
> pool, our rados_append calls return error code 95 and message "Operation not
> I've had several discussions with members in the IRC chatroom regarding
> this, and the general consensus I've got is:
> 1) Use write alignment.
> 2) Put a replicated pool in front of the EC pool
> 3) EC pools have a limited feature set
> Regarding point 1), are there any actual code example for how you would
> handle this in the context of rados_append? I have struggled to find even
> one. This seems to me something that should be handled by either the API
> libraries, or Ceph itself, not the client trying to write some data.
> librados requires a fair bit of knowledge from the user applications,
> yes. One thing you mention that sounds concerning is that you can't
> hold the objects in-memory — RADOS is not comfortable with very large
> objects and you'll find that things like backfill might not perform as
> you expect. (At this point everything will *probably* function, but it
> may be so slow as to make no difference to you when it hits that
> situation.) Certainly if your objects do not all fit neatly into
> buckets of a particular size and you have some that are very large,
> you will have a very not-uniform balance.
> But, if you want to learn about EC pools there is some documentation
> at http://docs.ceph.com/docs/master/dev/osd_internals/erasure_coding/
> (or in ceph.git/doc/dev/osd_internals/erasure_coding) from when they
> were being created.
> Regarding point 2) This seems to be a workaround, and generally not
> something we want to recommend to our customers. Is it detrimental to us an
> EC pool without a replicated pool? What are the performance costs of doing
> Yeah, don't do that. Cache pools are really tricky to use properly and
> turned out not to perform very well.
> Regarding point 3) Can you point me towards resources that describe what
> features / abilities you lose by adopting an EC pool?
> Same as above links, apparently. But really, you can read from and
> append to them. There are no object classes, no arbitrary overwrites,
> no omaps.
ceph-users mailing list