Re: [ceph-users] Hard drives of different sizes.

John Wilkins Fri, 06 Jun 2014 15:20:30 -0700

Vadim,

I think the issue is probably this:  we've made many of the defaults more
realistic for what would actually get put into production. In your case,
you are working with a 1-node cluster. Our quick start guides now reflect a
3-node cluster. In your case, your osd crush chooseleaf type is set to 1 by
default, which means that PGs mapped to one OSD are looking to peer with
PGs on another OSD, but on a different host. That's why you see "active,"
but you don't see "clean."


Take a look at the quick start from v0.69:
http://ceph.com/docs/v0.69/start/quick-ceph-deploy/  It's out of date now,
but it gives you a specific instruction to add osd crush chooseleaf type =
0 in your Ceph configuration file. I think I recall having to adjust the
CRUSH map after this change, but it's been awhile since I've done that.
Anyway, the reason your PGs aren't peering is very likely that CRUSH is
looking for OSDs on other hosts, not on the same host.

John




On Fri, Jun 6, 2014 at 3:49 AM, Vadim Kimlaychuk <vadim.kimlayc...@elion.ee>
wrote:

>  I have only one ruleset number 0 and all pools use it. My crushmap is
> very simple:
>
> --------------------------------------------------------------------------
>
> # begin crush map
>
> tunable choose_local_tries 0
>
> tunable choose_local_fallback_tries 0
>
> tunable choose_total_tries 50
>
> tunable chooseleaf_descend_once 1
>
>
>
> # devices
>
> device 0 osd.0
>
> device 1 osd.1
>
>
>
> # types
>
> type 0 osd
>
> type 1 host
>
> type 2 chassis
>
> type 3 rack
>
> type 4 row
>
> type 5 pdu
>
> type 6 pod
>
> type 7 room
>
> type 8 datacenter
>
> type 9 region
>
> type 10 root
>
>
>
> # buckets
>
> host storage {
>
>         id -2           # do not change unnecessarily
>
>         # weight 1.580
>
>         alg straw
>
>         hash 0  # rjenkins1
>
>         item osd.0 weight 1.000
>
>         item osd.1 weight 0.750
>
> }
>
> root default {
>
>         id -1           # do not change unnecessarily
>
>         # weight 1.580
>
>         alg straw
>
>         hash 0  # rjenkins1
>
>         item storage weight 1.750
>
> }
>
>
>
> # rules
>
> rule replicated_ruleset {
>
>         ruleset 0
>
>         type replicated
>
>         min_size 1
>
>         max_size 10
>
>         step take default
>
>         step chooseleaf firstn 0 type osd
>
>         step emit
>
> }
>
>
>
> # end crush map
>
> --------------------------------------------------------------------------
>
> I expect that Ceph will create replicas between 2 OSDs unless the second
> OSD becomes full. After that all new objects added to the storage should be
> „degradated“ because there is no space to create second replica.
>
> But I get degradated immediately. This is wierd.
>
>
>
> Vadim.
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Vincenzo Pii
> *Sent:* Friday, June 06, 2014 12:34 PM
>
> *To:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Hard drives of different sizes.
>
>
>
> Hi Vadim,
>
>
>
> Is every pool also using your custom crush_ruleset ("step chooseleaf
> firstn 0 type osd")?
>
> Otherwise Ceph will use the default rule to replicate data on separate
> hosts, which, in your case of a single host, cannot work.
>
>
>
> You can check it with
>
>
>
>     ceph osd dump --format=json-pretty
>
>
>
> And in case apply the rule with
>
>
>
>     ceph osd pool set <pool_name> crush_ruleset <rulesetId>
>
>
>
> And you can check your custom rulseteid with
>
>
>
>     ceph osd crush dump
>
>
>
> Hope this helps!
>
>
>
> Regards,
>
> Vincenzo.
>
>
>
> 2014-06-06 8:24 GMT+02:00 Vadim Kimlaychuk <vadim.kimlayc...@elion.ee>:
>
> Michael, indeed I have pool size = 3. I changed it to  2. After that I
> have recompiled crush map to reflect different sizes of hard drives and put
> 1.0 to 1Tb drive and 0.75 for 750Gb.
>
>
>
> Now I have all my PG-s at status "active".  It should be „active+clean“
> isn’t it ?
>
> I put object into the cluster and have
>
>
>
>      health HEALTH_WARN 192 pgs stuck unclean; recovery 1/2 objects
> degraded (50.000%)
>
>      monmap e1: 1 mons at {storage=172.16.3.2:6789/0}, election epoch 2,
> quorum 0 storage
>
>      osdmap e19: 2 osds: 2 up, 2 in
>
>       pgmap v42: 192 pgs, 3 pools, 414 bytes data, 1 objects
>
>             75584 kB used, 1619 GB / 1619 GB avail
>
>             1/2 objects degraded (50.000%)
>
>                  192 active
>
>
>
> Does that mean object is stored at wrong place?  Or set up is still
> incomplete?
>
>
>
> Thanks.
>
>
>
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Jeremy Hanmer
> Sent: Thursday, June 05, 2014 9:41 PM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Hard drives of different sizes.
>
>
>
> You'll also want to change the crush weights of your OSDs to reflect the
> different sizes so that the smaller disks don't get filled up prematurely.
> See "weighting bucket items" here:
>
> http://ceph.com/docs/master/rados/operations/crush-map/
>
>
>
> On Thu, Jun 5, 2014 at 10:14 AM, Michael <mich...@onlinefusion.co.uk>
> wrote:
>
> > ceph osd dump | grep size
>
> >
>
> > Check that all pools are size 2, min size 2 or 1.
>
> >
>
> > If not you can change on the fly with:
>
> > ceph osd pool set #poolname size/min_size #size
>
> >
>
> > See docs http://ceph.com/docs/master/rados/operations/pools/ for
>
> > alterations to pool attributes.
>
> >
>
> > -Michael
>
> >
>
> >
>
> > On 05/06/2014 17:29, Vadim Kimlaychuk wrote:
>
> >>
>
> >> ____________________________
>
> >>
>
> >> I have
>
> >>   osd pool default size = 2
>
> >> at my ceph.conf. Shouldn' it tell ceph to use 2 OSDs ? Or it is
>
> >> somewhere in CRUSH map?
>
> >>
>
> >> Vadim
>
> >> ____________
>
> >> From: Christian Balzer [ch...@gol.com]
>
> >> Sent: Thursday, June 05, 2014 18:26
>
> >> To: Vadim Kimlaychuk
>
> >> Cc: ceph-users@lists.ceph.com
>
> >> Subject: Re: [ceph-users] Hard drives of different sizes.
>
> >>
>
> >> Hello,
>
> >>
>
> >> On Thu, 5 Jun 2014 14:11:47 +0000 Vadim Kimlaychuk wrote:
>
> >>
>
> >>> Hello,
>
> >>>
>
> >>>              Probably this is anti-pattern, but I have to get answer
>
> >>> how this will work / not work. Input:
>
> >>>              I have single host for tests with ceph 0.80.1 and 2 OSD:
>
> >>>              OSD.0 – 1000 Gb
>
> >>>              OSD.1 – 750 Gb
>
> >>>
>
> >>>              Recompiled CRUSH map to set „step chooseleaf firstn 0
>
> >>> type osd“
>
> >>>
>
> >> You got it half right.
>
> >>
>
> >> Version .8x aka Firefly has a default replication of 3, so you would
>
> >> need
>
> >> 3 OSDs at least.
>
> >>
>
> >> Christian
>
> >>>
>
> >>>              I am expecting, that part of PG-s will have status
>
> >>> „active+clean“ (with size of ~750Gb) another part of PG-s will have
>
> >>> „active+degradated“ (with size of ~250Gb), because there is not
>
> >>> enough place to replicate data on the second OSD.
>
> >>>
>
> >>>              Instead I have ALL PG-s „active + degradated“
>
> >>>
>
> >>> Output:
>
> >>>       health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
>
> >>>       monmap e1: 1 mons at {storage=172.16.3.2:6789/0}, election
>
> >>> epoch 2, quorum 0 storage osdmap e15: 2 osds: 2 up, 2 in
>
> >>>        pgmap v29: 192 pgs, 3 pools, 0 bytes data, 0 objects
>
> >>>              71496 kB used, 1619 GB / 1619 GB avail
>
> >>>                   192 active+degraded
>
> >>>
>
> >>>              What is the logic behind this?? Can I use different
>
> >>> hard drives successfully? If yes – how?
>
> >>>
>
> >>> Thank you for explanation,
>
> >>>
>
> >>> Vadim
>
> >>>
>
> >>
>
> >> --
>
> >> Christian Balzer        Network/Systems Engineer
>
> >> ch...@gol.com           Global OnLine Japan/Fusion Communications
>
> >> http://www.gol.com/
>
> >> _______________________________________________
>
> >> ceph-users mailing list
>
> >> ceph-users@lists.ceph.com
>
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> >
>
> >
>
> > _______________________________________________
>
> > ceph-users mailing list
>
> > ceph-users@lists.ceph.com
>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
>
> ceph-users mailing list
>
> ceph-users@lists.ceph.com
>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
>
> --
>
> Vincenzo Pii
>
> Researcher, InIT Cloud Computing Lab
> Zurich University of Applied Sciences (ZHAW)
> http://www.cloudcomp.ch/
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Hard drives of different sizes.

Reply via email to