Thanks for the reply Gegory, Sorry if this is in the wrong direction or something. Maybe I do not understand
To test uploads I either use bash time and either python-swiftclient or boto key.set_contents_from_filename to the radosgw. I was unaware that radosgw had any type of throttle settings in the configuration (I can't seem to find any either). As for rbd mounts I test by creating a 1TB mount and writing a file to it through time+cp or dd. Not the most accurate test but I think should be good enough as a quick functionality test. So for writes, it's more for functionality than performance. I would think a basic functionality test should yield more than 8mb/s though. As for checking admin sockets: I have actually, I set the 3rd gateways debug_civetweb to 10 as well as debug_rgw to 5 but I still do not see anything that stands out. The snippet of the log I pasted has these values set. I did the same for an osd that is marked as slow (1112). All I can see in the log for the osd are ticks and heartbeat responses though, nothing that shows any issues. Finally I did it for the primary monitor node to see if I would see anything there with debug_mon set to 5 (http://pastebin.com/hhnaFac1). I do not really see anything that would stand out as a failure (like a fault or timeout error). What kind of throttler limits do you mean? I didn't/don't see any mention of rgw throttler limits in the ceph.com docs or admin socket just osd/ filesystem throttle like inode/flusher limits, do you mean these? I have not messed with these limits yet on this cluster, do you think it would help? On 12/18/2014 10:24 PM, Gregory Farnum wrote: > What kind of uploads are you performing? How are you testing? > Have you looked at the admin sockets on any daemons yet? Examining the > OSDs to see if they're behaving differently on the different requests > is one angle of attack. The other is look into is if the RGW daemons > are hitting throttler limits or something that the RBD clients aren't. > -Greg > On Thu, Dec 18, 2014 at 7:35 PM Sean Sullivan <[email protected] > <mailto:[email protected]>> wrote: > > Hello Yall! > > I can't figure out why my gateways are performing so poorly and I > am not > sure where to start looking. My RBD mounts seem to be performing fine > (over 300 MB/s) while uploading a 5G file to Swift/S3 takes 2m32s > (32MBps i believe). If we try a 1G file it's closer to 8MBps. Testing > with nuttcp shows that I can transfer from a client with 10G interface > to any node on the ceph cluster at the full 10G and ceph can transfer > close to 20G between itself. I am not really sure where to start > looking > as outside of another issue which I will mention below I am clueless. > > I have a weird setup > [osd nodes] > 60 x 4TB 7200 RPM SATA Drives > 12 x 400GB s3700 SSD drives > 3 x SAS2308 PCI-Express Fusion-MPT cards (drives are split evenly > across > the 3 cards) > 512 GB of RAM > 2 x CPU E5-2670 v2 @ 2.50GHz > 2 x 10G interfaces LACP bonded for cluster traffic > 2 x 10G interfaces LACP bonded for public traffic (so a total of 4 10G > ports) > > [monitor nodes and gateway nodes] > 4 x 300G 1500RPM SAS drives in raid 10 > 1 x SAS 2208 > 64G of RAM > 2 x CPU E5-2630 v2 > 2 x 10G interfaces LACP bonded for public traffic (total of 2 10G > ports) > > > Here is a pastebin dump of my details, I am running ceph giant 0.87 > (c51c8f9d80fa4e0168aa52685b8de40e42758578) and kernel > 3.13.0-40-generic > across the entire cluster. > > http://pastebin.com/XQ7USGUz -- ceph health detail > http://pastebin.com/8DCzrnq1 -- /etc/ceph/ceph.conf > http://pastebin.com/BC3gzWhT -- ceph osd tree > http://pastebin.com/eRyY4H4c -- > /var/log/radosgw/client.radosgw.rgw03.log > http://paste.ubuntu.com/9565385/ -- crushmap (pastebin wouldn't > let me) > > > We ran into a few issues with density (conntrack limits, pid > limit, and > number of open files) all of which I adjusted by bumping the > ulimits in > /etc/security/limits.d/ceph.conf or sysctl. I am no longer seeing any > signs of these limits being hit so I have not included my limits or > sysctl conf. If you like this as well let me know and I can > include it. > > One of the issues I am seeing is that OSDs have started to flop/ be > marked as slow. The cluster was HEALTH_OK with all of the disks added > for over 3 weeks before this behaviour started. RBD transfers seem > to be > fine for the most part which makes me think that this has little > baring > on the gateway issue but it may be related. Rebooting the OSD seems to > fix this issue. > > I would like to figure out the root cause of both of these issues and > post the results back here if possible (perhaps it can help other > people). I am really looking for a place to start looking at as the > gateway just outputs that it is posting data and all of the logs > (outside of the monitors reporting down osds) seem to show a fully > functioning cluster. > > Please help. I am in the #ceph room on OFTC every day as > 'seapasulli' as > well. > _______________________________________________ > ceph-users mailing list > [email protected] <mailto:[email protected]> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
