Hi Nick,

On Thu, Jul 21, 2016 at 8:33 AM, Nick Fisk <n...@fisk.me.uk> wrote:
>> -----Original Message-----
>> From: w...@globe.de [mailto:w...@globe.de]
>> Sent: 21 July 2016 13:23
>> To: n...@fisk.me.uk; 'Horace Ng' <hor...@hkisl.net>
>> Cc: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] Ceph + VMware + Single Thread Performance
>>
>> Okay and what is your plan now to speed up ?
>
> Now I have come up with a lower latency hardware design, there is not much 
> further improvement until persistent RBD caching is implemented, as you will 
> be moving the SSD/NVME closer to the client. But I'm happy with what I can 
> achieve at the moment. You could also experiment with bcache on the RBD.

Reviving this thread, would you be willing to share the details of the
low latency hardware design?  Are you optimizing for NFS or iSCSI?

Thank you,
Alex

>
>>
>> Would it help to put in multiple P3700 per OSD Node to improve performance 
>> for a single Thread (example Storage VMotion) ?
>
> Most likely not, it's all the other parts of the puzzle which are causing the 
> latency. ESXi was designed for storage arrays that service IO's in 100us-1ms 
> range, Ceph is probably about 10x slower than this, hence the problem. 
> Disable the BBWC on a RAID controller or SAN and you will the same behaviour.
>
>>
>> Regards
>>
>>
>> Am 21.07.16 um 14:17 schrieb Nick Fisk:
>> >> -----Original Message-----
>> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>> >> Of w...@globe.de
>> >> Sent: 21 July 2016 13:04
>> >> To: n...@fisk.me.uk; 'Horace Ng' <hor...@hkisl.net>
>> >> Cc: ceph-users@lists.ceph.com
>> >> Subject: Re: [ceph-users] Ceph + VMware + Single Thread Performance
>> >>
>> >> Hi,
>> >>
>> >> hmm i think 200 MByte/s is really bad. Is your Cluster in production 
>> >> right now?
>> > It's just been built, not running yet.
>> >
>> >> So if you start a storage migration you get only 200 MByte/s right?
>> > I wish. My current cluster (not this new one) would storage migrate at
>> > ~10-15MB/s. Serial latency is the problem, without being able to
>> > buffer, ESXi waits on an ack for each IO before sending the next. Also it 
>> > submits the migrations in 64kb chunks, unless you get VAAI
>> working. I think esxi will try and do them in parallel, which will help as 
>> well.
>> >
>> >> I think it would be awesome if you get 1000 MByte/s
>> >>
>> >> Where is the Bottleneck?
>> > Latency serialisation, without a buffer, you can't drive the devices
>> > to 100%. With buffered IO (or high queue depths) I can max out the 
>> > journals.
>> >
>> >> A FIO Test from Sebastien Han give us 400 MByte/s raw performance from 
>> >> the P3700.
>> >>
>> >> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your
>> >> -ssd-is-suitable-as-a-journal-device/
>> >>
>> >> How could it be that the rbd client performance is 50% slower?
>> >>
>> >> Regards
>> >>
>> >>
>> >> Am 21.07.16 um 12:15 schrieb Nick Fisk:
>> >>> I've had a lot of pain with this, smaller block sizes are even worse.
>> >>> You want to try and minimize latency at every point as there is no
>> >>> buffering happening in the iSCSI stack. This means:-
>> >>>
>> >>> 1. Fast journals (NVME or NVRAM)
>> >>> 2. 10GB or better networking
>> >>> 3. Fast CPU's (Ghz)
>> >>> 4. Fix CPU c-state's to C1
>> >>> 5. Fix CPU's Freq to max
>> >>>
>> >>> Also I can't be sure, but I think there is a metadata update
>> >>> happening with VMFS, particularly if you are using thin VMDK's, this
>> >>> can also be a major bottleneck. For my use case, I've switched over to 
>> >>> NFS as it has given much more performance at scale and
>> less headache.
>> >>>
>> >>> For the RADOS Run, here you go (400GB P3700):
>> >>>
>> >>> Total time run:         60.026491
>> >>> Total writes made:      3104
>> >>> Write size:             4194304
>> >>> Object size:            4194304
>> >>> Bandwidth (MB/sec):     206.842
>> >>> Stddev Bandwidth:       8.10412
>> >>> Max bandwidth (MB/sec): 224
>> >>> Min bandwidth (MB/sec): 180
>> >>> Average IOPS:           51
>> >>> Stddev IOPS:            2
>> >>> Max IOPS:               56
>> >>> Min IOPS:               45
>> >>> Average Latency(s):     0.0193366
>> >>> Stddev Latency(s):      0.00148039
>> >>> Max latency(s):         0.0377946
>> >>> Min latency(s):         0.015909
>> >>>
>> >>> Nick
>> >>>
>> >>>> -----Original Message-----
>> >>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
>> >>>> Behalf Of Horace
>> >>>> Sent: 21 July 2016 10:26
>> >>>> To: w...@globe.de
>> >>>> Cc: ceph-users@lists.ceph.com
>> >>>> Subject: Re: [ceph-users] Ceph + VMware + Single Thread Performance
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> Same here, I've read some blog saying that vmware will frequently
>> >>>> verify the locking on VMFS over iSCSI, hence it will have much slower 
>> >>>> performance than NFS (with different locking mechanism).
>> >>>>
>> >>>> Regards,
>> >>>> Horace Ng
>> >>>>
>> >>>> ----- Original Message -----
>> >>>> From: w...@globe.de
>> >>>> To: ceph-users@lists.ceph.com
>> >>>> Sent: Thursday, July 21, 2016 5:11:21 PM
>> >>>> Subject: [ceph-users] Ceph + VMware + Single Thread Performance
>> >>>>
>> >>>> Hi everyone,
>> >>>>
>> >>>> we see at our cluster relatively slow Single Thread Performance on the 
>> >>>> iscsi Nodes.
>> >>>>
>> >>>>
>> >>>> Our setup:
>> >>>>
>> >>>> 3 Racks:
>> >>>>
>> >>>> 18x Data Nodes, 3 Mon Nodes, 3 iscsi Gateway Nodes with tgt (rbd cache 
>> >>>> off).
>> >>>>
>> >>>> 2x Samsung SM863 Enterprise SSD for Journal (3 OSD per SSD) and 6x
>> >>>> WD Red 1TB per Data Node as OSD.
>> >>>>
>> >>>> Replication = 3
>> >>>>
>> >>>> chooseleaf = 3 type Rack in the crush map
>> >>>>
>> >>>>
>> >>>> We get only ca. 90 MByte/s on the iscsi Gateway Servers with:
>> >>>>
>> >>>> rados bench -p rbd 60 write -b 4M -t 1
>> >>>>
>> >>>>
>> >>>> If we test with:
>> >>>>
>> >>>> rados bench -p rbd 60 write -b 4M -t 32
>> >>>>
>> >>>> we get ca. 600 - 700 MByte/s
>> >>>>
>> >>>>
>> >>>> We plan to replace the Samsung SSD with Intel DC P3700 PCIe NVM'e
>> >>>> for the Journal to get better Single Thread Performance.
>> >>>>
>> >>>> Is anyone of you out there who has an Intel P3700 for Journal an
>> >>>> can give me back test results with:
>> >>>>
>> >>>>
>> >>>> rados bench -p rbd 60 write -b 4M -t 1
>> >>>>
>> >>>>
>> >>>> Thank you very much !!
>> >>>>
>> >>>> Kind Regards !!
>> >>>>
>> >>>> _______________________________________________
>> >>>> ceph-users mailing list
>> >>>> ceph-users@lists.ceph.com
>> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>> _______________________________________________
>> >>>> ceph-users mailing list
>> >>>> ceph-users@lists.ceph.com
>> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to