Re: [ceph-users] Performance problems

Ziemowit Pierzycki Thu, 11 Apr 2013 17:25:14 -0700

No, I'm not using RDMA in this configuration since this will eventually get
deployed to production with 10G ethernet (yes RDMA is faster).  I would
prefer Ceph because it has a storage drive built into OpenNebula which my
company is using and as you mentioned individual drives.


I'm not sure what the problem is but it appears to me that one of the hosts
may be holding up the rest... with Ceph if the performance of one of the
hosts is much faster than others could this potentially slow down the
cluster to this level?


On Thu, Apr 11, 2013 at 7:42 AM, Mark Nelson <[email protected]>wrote:

> With GlusterFS are you using the native RDMA support?
>
> Ceph and Gluster tend to prefer pretty different disk setups too.  Afaik
> RH still recommends RAID6 beind each brick while we do better with
> individual disks behind each OSD.  You might want to watch the OSD admin
> socket and see if operations are backing up on any specific OSDs.
>
> Mark
>
>
> On 04/09/2013 12:54 PM, Ziemowit Pierzycki wrote:
>
>> Neither made a difference.  I also have a glusterFS cluster with two
>> nodes in replicating mode residing on 1TB drives:
>>
>> [root@triton speed]# dd conv=fdatasync if=/dev/zero
>> of=/mnt/speed/test.out bs=512k count=10000
>> 10000+0 records in
>> 10000+0 records out
>> 5242880000 bytes (5.2 GB) copied, 43.573 s, 120 MB/s
>>
>> ... and Ceph:
>>
>> [root@triton temp]# dd conv=fdatasync if=/dev/zero of=/mnt/temp/test.out
>> bs=512k count=10000
>> 10000+0 records in
>> 10000+0 records out
>> 5242880000 bytes (5.2 GB) copied, 366.911 s, 14.3 MB/s
>>
>>
>> On Mon, Apr 8, 2013 at 4:29 PM, Mark Nelson <[email protected]
>> <mailto:mark.nelson@inktank.**com <[email protected]>>> wrote:
>>
>>     On 04/08/2013 04:12 PM, Ziemowit Pierzycki wrote:
>>
>>         There is one SSD in each node.  IPoIB performance is about 7 gbps
>>         between each host.  CephFS is mounted via kernel client.  Ceph
>>         version
>>         is ceph-0.56.3-1.  I have a 1GB journal on the same drive as the
>>         OSD but
>>         on a seperate file system split via LVM.
>>
>>         Here is output of another test with fdatasync:
>>
>>         [root@triton temp]# dd conv=fdatasync if=/dev/zero
>>         of=/mnt/temp/test.out
>>         bs=512k count=10000
>>         10000+0 records in
>>         10000+0 records out
>>         5242880000 bytes (5.2 GB) copied, 359.307 s, 14.6 MB/s
>>         [root@triton temp]# dd if=/mnt/temp/test.out of=/dev/null bs=512k
>>         count=10000
>>         10000+0 records in
>>         10000+0 records out
>>         5242880000 bytes (5.2 GB) copied, 14.0521 s, 373 MB/s
>>
>>
>>     Definitely seems off!  How many SSDs are involved and how fast are
>>     they each?  The MTU idea might have merit, but I honestly don't know
>>     enough about how well IPoIB handles giant MTUs like that.  One thing
>>     I have noticed on other IPoIB setups is that TCP autotuning can
>>     cause a ton of problems.  You may want to try disabling it on all of
>>     the hosts involved:
>>
>>     echo 0 | tee /proc/sys/net/ipv4/tcp___**moderate_rcvbuf
>>
>>
>>     If that doesn't work, maybe try setting MTU to 9000 or 1500 if
>> possible.
>>
>>     Mark
>>
>>
>>
>>
>>         The network traffic appears to match the transfer speeds shown
>>         here too.
>>            Writing is very slow.
>>
>>
>>         On Mon, Apr 8, 2013 at 3:04 PM, Mark Nelson
>>         <[email protected] 
>> <mailto:mark.nelson@inktank.**com<[email protected]>
>> >
>>         <mailto:mark.nelson@inktank.__**com
>>
>>         <mailto:mark.nelson@inktank.**com <[email protected]>>>>
>> wrote:
>>
>>              Hi,
>>
>>              How many drives?  Have you tested your IPoIB performance
>>         with iperf?
>>                Is this CephFS with the kernel client?  What version of
>>         Ceph?  How
>>              are your journals configured? etc.  It's tough to make any
>>              recommendations without knowing more about what you are
>> doing.
>>
>>              Also, please use conv=fdatasync when doing buffered IO
>>         writes with dd.
>>
>>              Thanks,
>>              Mark
>>
>>
>>              On 04/08/2013 03:00 PM, Ziemowit Pierzycki wrote:
>>
>>                  Hi,
>>
>>                  The first test was writing 500 mb file and was clocked
>>         at 1.2
>>                  GBps.  The
>>                  second test was writing 5000 mb file at 17 MBps.  The
>>         third test was
>>                  reading the file at ~400 MBps.
>>
>>
>>                  On Mon, Apr 8, 2013 at 2:56 PM, Gregory Farnum
>>         <[email protected] <mailto:[email protected]>
>>                  <mailto:[email protected] <mailto:[email protected]>>
>>                  <mailto:[email protected] <mailto:[email protected]>
>>         <mailto:[email protected] <mailto:[email protected]>>>> wrote:
>>
>>                       More details, please. You ran the same test twice
>> and
>>                  performance went
>>                       up from 17.5MB/s to 394MB/s? How many drives in
>>         each node,
>>                  and of what
>>                       kind?
>>                       -Greg
>>                       Software Engineer #42 @ http://inktank.com |
>>         http://ceph.com
>>
>>
>>                       On Mon, Apr 8, 2013 at 12:38 PM, Ziemowit Pierzycki
>>                       <[email protected]
>>         <mailto:[email protected]**> <mailto:[email protected]
>>         <mailto:[email protected]**>__>
>>                  <mailto:[email protected]
>>         <mailto:[email protected]**>
>>
>>                  <mailto:[email protected]
>>         <mailto:[email protected]**>__>__>> wrote:
>>                        > Hi,
>>                        >
>>                        > I have a 3 node SSD-backed cluster connected over
>>                  infiniband (16K
>>                       MTU) and
>>                        > here is the performance I am seeing:
>>                        >
>>                        > [root@triton temp]# !dd
>>                        > dd if=/dev/zero of=/mnt/temp/test.out bs=512k
>>         count=1000
>>                        > 1000+0 records in
>>                        > 1000+0 records out
>>                        > 524288000 bytes (524 MB) copied, 0.436249 s,
>>         1.2 GB/s
>>                        > [root@triton temp]# dd if=/dev/zero
>>                  of=/mnt/temp/test.out bs=512k
>>                        > count=10000
>>                        > 10000+0 records in
>>                        > 10000+0 records out
>>                        > 5242880000 bytes (5.2 GB) copied, 299.077 s,
>>         17.5 MB/s
>>                        > [root@triton temp]# dd if=/mnt/temp/test.out
>>                  of=/dev/null bs=512k
>>                        > count=1000010000+0 records in
>>                        > 10000+0 records out
>>                        > 5242880000 bytes (5.2 GB) copied, 13.3015 s,
>>         394 MB/s
>>                        >
>>                        > Does that look right?  How do I check this is
>>         not a network
>>                       problem, because
>>                        > I remember seeing a kernel issue related to
>>         large MTU.
>>                        >
>>                        > ______________________________**
>> _____________________
>>
>>
>>                        > ceph-users mailing list
>>                        > [email protected]
>>         <mailto:[email protected].**com <[email protected]>>
>>                  <mailto:[email protected].**__com
>>         <mailto:[email protected].**com <[email protected]>>>
>>                  <mailto:[email protected].
>>         <mailto:[email protected].**>____com
>>                  <mailto:[email protected].**__com
>>         <mailto:[email protected].**com <[email protected]>
>> >>>
>>                        >
>>         http://lists.ceph.com/____**listinfo.cgi/ceph-users-ceph._**
>> ___com <http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com>
>>         
>> <http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**_com<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
>> >
>>
>>         
>> <http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**_com<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
>>         
>> <http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> >>
>>
>>                        >
>>
>>
>>
>>
>>                  ______________________________**_____________________
>>
>>
>>                  ceph-users mailing list
>>         [email protected] 
>> <mailto:[email protected].**com<[email protected]>
>> >
>>         <mailto:[email protected].**__com
>>         <mailto:[email protected].**com <[email protected]>>>
>>         http://lists.ceph.com/____**listinfo.cgi/ceph-users-ceph._**
>> ___com <http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com>
>>         
>> <http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**_com<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
>> >
>>
>>         
>> <http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**_com<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
>>         
>> <http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> >>
>>
>>
>>              ______________________________**_____________________
>>
>>
>>              ceph-users mailing list
>>         [email protected] 
>> <mailto:[email protected].**com<[email protected]>
>> >
>>         <mailto:[email protected].**__com
>>         <mailto:[email protected].**com <[email protected]>>>
>>         http://lists.ceph.com/____**listinfo.cgi/ceph-users-ceph._**
>> ___com <http://lists.ceph.com/____listinfo.cgi/ceph-users-ceph.____com>
>>         
>> <http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**_com<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
>> >
>>
>>
>>              <http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**
>> _com <http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
>>         
>> <http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> >>
>>
>>
>>
>>
>>         ______________________________**___________________
>>         ceph-users mailing list
>>         [email protected] 
>> <mailto:[email protected].**com<[email protected]>
>> >
>>         
>> http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**_com<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
>>         
>> <http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> >
>>
>>
>>     ______________________________**___________________
>>     ceph-users mailing list
>>     [email protected] 
>> <mailto:[email protected].**com<[email protected]>
>> >
>>     
>> http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**_com<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
>>     
>> <http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> >
>>
>>
>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Performance problems

Reply via email to