Re: [ceph-users] Some question about data placement

John Wilkins Mon, 22 Apr 2013 10:07:16 -0700

George,

I don't think you needed to change your default CRUSH rule at all, since
all the OSDs are on the same machine.

It sounds to me like you are conflating replication with striping. Ceph
clients write an object to a pool, which maps the object to a placement
group, and the object gets stored in that placement group on an OSD as
determined by CRUSH. In fact, a placement group ID is a combination of the
pool number (not its name) and the hash code generated by CRUSH. See
http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/#monitoring-placement-group-states
for
a brief description of placement group IDs about half way through this
section.

When you create a pool, you may set the number of replicas with the "size"
setting. Documentation suggests modifying the default number of placement
groups, but size defaults to 2 already.
http://ceph.com/docs/master/rados/configuration/pool-pg-config-ref/

Ceph clients write an object to the primary OSD, and the OSD makes the
replicas by computing where the replica should be stored.
http://ceph.com/docs/master/architecture/#how-ceph-scales

However, the object store DOES NOT break up the object and store it across
all OSDs. The Ceph clients do have the ability to stripe data across
objects.
http://ceph.com/docs/master/architecture/#how-ceph-clients-stripe-data

On Mon, Apr 22, 2013 at 3:43 AM, George Shuklin <[email protected]> wrote:

> I still lost at documentation.
>
> Let's assume I has 8 osd's on single server (osd.[0-7]). I use cephfs and
> want to has redundancy 2 (means each peace of data on two osd's) and
> spanning of the file across all OSD's (to get some performance on writing).
>
> My expectation: 8x speed on reading, 4x speed on writing (compare to
> single drive). [I put aside some overhead]
>
> I'm checking performance of random writes and reads to single file on
> mounted cephfs (fio, iodepth=32, blocksize=4k) and I'm getting nice read
> performance (1000 IOPS = 125x8, as expected) and just and only 30 iops on
> writing. Less then half of single drive performance.
>
> I want to understand what I'm doing wrong.
>
> My settings (for all OSDs they are same, but with different disk name):
>
> [osd.1]
>         host = testserver
>         devs = /dev/sdb
>         osd mkfs type = xfs
>
> I tried to change CRUSH map:  "step choose firstn 2 type osd" (for 'data'
> rule, compare to default), but no effect.
>
> I think here is some huge mistake I making... I need to say 'no more than
> two copies of data' and 'block size = 4k when stripping'.
>
> Please help.
>
> Thanks.
>
> ______________________________**_________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>

-- 
John Wilkins
Senior Technical Writer
Intank
[email protected]
(415) 425-9599
http://inktank.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Some question about data placement

Reply via email to