On Wed, 4 May 2011, Zenon Panoussis wrote:
> On 05/04/2011 08:21 PM, Sage Weil wrote:
>
> >> does "min_size 2, max_size 2" mean that I want "2 copies of the data on
> >> each
> >> host" or "2 copies of the data in total in the entire cluster"?
>
> > Neither, actually. It means that this rule will be used when we ask crush
> > for ruleset 0 and 2 replicas. If you change a pg to have 3x replication,
> > ceph will ask for ruleset 0 and 3 replicas, and this rule won't be used.
>
> In other words, the total number of replicas in the cluster is determined on
> the PG level? But then how do I control which PGs are physically stored where?
>
> > You probably want min_size 1 and max_size 10.
>
> Taking what you just wrote together with a re-reading of the wiki, I must
> admit
> that I still don't quite grasp it. The wiki says
>
> That is, when placing object replicas, we start at the root hierarchy, and
> choose N items of type 'device'. ('0' means to grab however many replicas.
> The rules are written to be general for some range of N, 1-10 in this case.)
>
> What I make out of all this is that
>
> rule data {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step take root
> step choose firstn 0 type device
> step emit
> }
>
> means that IF the PGs are set to create anything between 1 and 10 replicas,
> then
> the replicas should be placed on devices, using an unlimited number of
> devices.
>
> Is that correct?
>
> My problem really is how to configure ceph to put exactly 1 replica of the
> data
> (and metadata) on each and every of some kind of target. For example, if I
> have
> 10 racks, I want exactly 1 copy of the data in each rack, no more, no less
> (and
> I don't care which host in that rack the data lands on). If I have 10 hosts,
> I want exactly 1 copy of the data on each host (and I don't care which OSD on
> that host the data lands on). If I only have 10 OSDs, I want exactly 1 copy of
> the data on each and every OSD.
>
> Assuming that the number of targets is fixed and known, what is the way to do
> this?
Yes. So the rule you have is right (at least up to 10 nodes). Then you
need to set the pg_size (aka replication level) for the pools you care
about. For 5x, that's
ceph osd pool set data size 4
You can see the current sizes with
ceph osd dump -o - | grep pool
and look at pg_size attribute.
> And going back to PGs, if "ceph osd dump -o -|grep pg_size" says
>
> pg_pool 0 'data' pg_pool(rep pg_size 2 crush_ruleset 0 object_hash rjenkins
> pg_num 128 pgp_num 128 lpg_num 2 lpgp_num 2 last_change 66 owner 0)
>
> and "ceph -w" says
>
> pg v319405: 528 pgs: 528 active+clean; 22702 MB data, 77093 MB used, 346 GB
> / 446 GB avail
>
> how do the 128 PGs of "ceph osd dump" relate to the 528 PGs of "ceph -w"?
There are several different pools, each sliced into many pgs.
> As an aside, I think that, to a certain extent, improving the
> documentation could contribute more to the code base than improving the
> actual code. You guys spend a lot of time answering the kind of
> questions that I've been posing (and thank you for doing so), while at
> the same time missing out on the debugging help you could be getting
> instead if your user base could move past its trivial problems. If I
> were your scrum master, I'd dedicate an entire sprint on the wiki alone.
The replication is covered by
http://ceph.newdream.net/wiki/Adjusting_replication_level
Any specific suggestions on how that should be improved?
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html