Re: recoverying from 95% full osd

Roman Hlynovskiy Thu, 10 Jan 2013 08:50:23 -0800

please disregard my last email. I followed recommendation for
tunables, but missed the note that kernel version should be 3.5 or
later in order to support the tunables. I reverted them back to the
legacy ones and everything is back online.


2013/1/10 Roman Hlynovskiy <[email protected]>:
> Hello again!
>
> I left the system in working state overnight and got it in a wierd
> state this morning:
>
> chef@ceph-node02:/var/log/ceph$ ceph -s
>    health HEALTH_OK
>    monmap e4: 3 mons at
> {a=192.168.7.11:6789/0,b=192.168.7.12:6789/0,c=192.168.7.13:6789/0},
> election epoch 254, quorum 0,1,2 a,b,c
>    osdmap e348: 3 osds: 3 up, 3 in
>     pgmap v114606: 384 pgs: 384 active+clean; 161 GB data, 326 GB
> used, 429 GB / 755 GB avail
>    mdsmap e4623: 1/1/1 up {0=b=up:active}, 1 up:standby
>
> so, it looks ok from the first point of view,  however I am not able
> to mount ceph from any of nodes:
> be01:~# mount /var/www/jroger.org/data
> mount: 192.168.7.11:/: can't read superblock
>
> on the nodes, which had ceph mounted yesterday I am able to look
> through the filesystem, but any kind of data read causes client to
> hang.
>
> I made a trace on the active mds with debug ms/mds = 20
> (http://wh.of.kz/ceph_logs.tar.gz)
> Could you please help to identify what's going on.
>
> 2013/1/9 Roman Hlynovskiy <[email protected]>:
>>>> How many pgs do you have? ('ceph osd dump | grep ^pool').
>>>
>>> I believe this is it. 384 PGs, but three pools of which only one (or maybe 
>>> a second one, sort of) is in use. Automatically setting the right PG counts 
>>> is coming some day, but until then being able to set up pools of the right 
>>> size is a big gotcha. :(
>>> Depending on how mutable the data is, recreate with larger PG counts on the 
>>> pools in use. Otherwise we can do something more detailed.
>>> -Greg
>>
>> hm... what would be recommended PG size per pool ?
>>
>> chef@cephgw:~$ ceph osd lspools
>> 0 data,1 metadata,2 rbd,
>> chef@cephgw:~$ ceph osd pool get data pg_num
>> PG_NUM: 128
>> chef@cephgw:~$ ceph osd pool get metadata pg_num
>> PG_NUM: 128
>> chef@cephgw:~$ ceph osd pool get rbd pg_num
>> PG_NUM: 128
>>
>> according to the 
>> http://ceph.com/docs/master/rados/operations/placement-groups/
>>
>>                 (OSDs * 100)
>> Total PGs = ------------
>>                    Replicas
>>
>> I have 3 OSDs and 2 replicas for each object, which gives recommended PG = 
>> 150
>>
>> will it make much difference to set 150 instead of 128 or I should
>> base on different values?
>>
>> btw, just one more off-topic question:
>>
>> chef@ceph-node03:~$ ceph pg dump| egrep -v '^(0\.|1\.|2\.)'| column -t
>> dumped             all        in         format     plain
>> version            113906
>> last_osdmap_epoch  323
>> last_pg_scan       1
>> full_ratio         0.95
>> nearfull_ratio     0.85
>> pg_stat            objects    mip        degr       unf    bytes
>>   log           disklog   state     state_stamp  v  reported  up
>> acting  last_scrub  scrub_stamp  last_deep_scrub  deep_scrub_stamp
>> pool               0          74748      0          0      0
>>   286157692336  17668034  17668034
>> pool               1          618        0          0      0
>>   131846062     6414518   6414518
>> pool               2          0          0          0      0
>>   0             0         0
>> sum                75366      0          0          0
>> 286289538398  24082552      24082552
>> osdstat            kbused     kbavail    kb         hb     in
>>   hb            out
>> 0                  157999220  106227596  264226816  [1,2]  []
>> 1                  185604948  78621868   264226816  [0,2]  []
>> 2                  219475396  44751420   264226816  [0,1]  []
>> sum                563079564  229600884  792680448
>>
>> pool 0 (data) is used for data storage
>> pool 1 (metadata) is used for metadata storage
>>
>> what is pool 2 (rbd) for? looks like it's absolutely empty.
>>
>>
>>>
>>>>
>>>> You might also adjust the crush tunables, see
>>>>
>>>> http://ceph.com/docs/master/rados/operations/crush-map/?highlight=tunable#tunables
>>>>
>>>> sage
>>>>
>>
>> Thanks for the link, Sage I set tunable values according to the doc.
>> Btw, online document is missing magical param for crushmap which
>> allows those scary_tunables )
>>
>>
>>
>> --
>> ...WBR, Roman Hlynovskiy
>
>
>
> --
> ...WBR, Roman Hlynovskiy



-- 
...WBR, Roman Hlynovskiy
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: recoverying from 95% full osd

Reply via email to