Thanks for your advice.
I will try to reweight osds of my cluster.

Why ceph is so sensitive to unblanced pg distribution during high load? ceph 
osd df result is: https://pastebin.com/ur4Q9jsA.  ceph osd perf result is: 
https://pastebin.com/87DitPhV

There is no osd with very high pg count compare to others. When the wirte test 
load is low everything seems fine, but during high write load test, some of the 
osds with higher pg can have 3-10 time of fs_apply_latency compare to others. 

My guess is the high loaded osds kinda slowed the whole cluster(because I have 
only one pool with all osds)to the level of how fast they can handle. So other 
osd has lower load and have a good latency.

Is this expected during high load(Indicate the load is too hight for current 
cluster to hanlde)? 

How does luminous solve the unevenly pg distribution problem?I read about there 
is a pg-upmap exception table in the osdmap in luminous 12.2.x. It is said to 
use this it is possible to achive perfect pg distribution among osds.

2018-03-09 

shadow_lin 



发件人:David Turner <drakonst...@gmail.com>
发送时间:2018-03-09 06:45
主题:Re: [ceph-users] Uneven pg distribution cause high fs_apply_latency on osds 
with more pgs
收件人:"shadow_lin"<shadow_...@163.com>
抄送:"ceph-users"<ceph-users@lists.ceph.com>

PGs being unevenly distributed is a common occurrence in Ceph.  Luminous 
started making some steps towards correcting this, but you're in Jewel.  There 
are a lot of threads in the ML archives about fixing PG distribution.  
Generally every method comes down to increasing the weight on OSDs with too few 
PGs and decreasing the weight on the OSDs with too many PGs.  There are a lot 
of schools of thought on the best way to implement this in your environment 
which has everything to do with your client IO patterns and workloads.  Looking 
into `ceph osd reweight-by-pg` might be a good place for you to start as you 
are only looking at 1 pool in your cluster.  If you have more pools, you 
generally need `ceph osd reweight-by-utilization`.


On Wed, Mar 7, 2018 at 8:19 AM shadow_lin <shadow_...@163.com> wrote:

Hi list,
       Ceph version is jewel 10.2.10 and all osd are using filestore.
The Cluster has 96 osds and 1 pool with size=2 replication with 4096 pg(base on 
pg calculate method from ceph doc for 100pg/per osd).
The osd with the most pg count has 104 PGs and there are 6 osds have above 100 
PGs
Most of the osd have around 7x-9x PGs
The osd with the least pg count has 58 PGs

During the write test some of the osds have very high fs_apply_latency like 
1000ms-4000ms while the normal ones are like 100-600ms. The osds with high 
latency are always the ones with more pg on it.

iostat on the high latency osd shows the hdds are having high %util at about 
95%-96% while the normal ones are having %util at 40%-60%

I think the reason to cause this is because the osds have more pgs need to 
handle more write request to it.Is this right?
But even though the pg distribution is not even but the variation is not that 
much.How could the performance be so sensitive to it?

Is there anything I can do to improve the performance and reduce the latency?

How can I make the pg distribution to be more even?

Thanks


2018-03-07



shadowlin

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to