Re: [ceph-users] ceph + vmware

Jake Young Mon, 11 Jul 2016 13:25:41 -0700

I'm using this setup with ESXi 5.1 and I get very good performance.  I
suspect you have other issues.  Reliability is another story (see Nick's
posts on tgt and HA to get an idea of the awful problems you can have), but
for my test labs the risk is acceptable.



One change I found helpful is to run tgtd with 128 threads.  I'm running
Ubuntu 14.04, so I editted my /etc/init.tgt.conf file and changed the line
that read:

exec tgtd

to

exec tgtd --nr_iothreads=128


If you're not concerned with reliability, you can enhance throughput even
more by enabling rbd client write-back cache in your tgt VM's ceph.conf
file (you'll need to restart tgtd for this to take effect):

[client]
rbd_cache = true
rbd_cache_size = 67108864 # (64MB)
rbd_cache_max_dirty = 50331648 # (48MB)
rbd_cache_target_dirty = 33554432 # (32MB)
rbd_cache_max_dirty_age = 2
rbd_cache_writethrough_until_flush = false




Here's a sample targets.conf:

  <target iqn.2014-04.tgt.Charter>
  initiator-address ALL
  scsi_sn Charter
  #vendor_id CEPH
  #controller_tid 1
  write-cache on
  read-cache on
  driver iscsi
  bs-type rbd
  <backing-store charter/vmguest>
  lun 5
  scsi_id cfe1000c4a71e700506357
  </backing-store>
  <backing-store charter/voting>
  lun 6
  scsi_id cfe1000c4a71e700507157
  </backing-store>
  <backing-store charter/oradata>
  lun 7
  scsi_id cfe1000c4a71e70050da7a
  </backing-store>
  <backing-store charter/oraback>
  lun 8
  scsi_id cfe1000c4a71e70050bac0
  </backing-store>
  </target>



I don't have FIO numbers handy, but I have some oracle calibrate io output.


We're running Oracle RAC database servers in linux VMs on ESXi 5.1, which
use iSCSI to connect to the tgt service.  I only have a single connection
setup in ESXi for each LUN.  I tested using multipathing and two tgt VMs
presenting identical LUNs/RBD disks, but found that there wasn't a
significant performance gain by doing this, even with round-robin path
selecting in VMware.


These tests were run from two RAC VMs, each on a different host, with both
hosts connected to the same tgt instance.  The way we have oracle
configured, it would have been using two of the LUNs heavily during this
calibrate IO test.


This output is with 128 threads in tgtd and rbd client cache enabled:

START_TIME           END_TIME               MAX_IOPS   MAX_MBPS
MAX_PMBPS   LATENCY       DISKS
-------------------- -------------------- ---------- ----------
---------- ---------- ----------
28-JUN-016 15:10:50  28-JUN-016 15:20:04       14153        658
412       14          75


This output is with the same configuration, but with rbd client cache
disabled:

START_TIME         END_TIME            MAX_IOPS   MAX_MBPS  MAX_PMBPS
  LATENCY       DISKS
-------------------- -------------------- ---------- ----------
---------- ---------- ----------
28-JUN-016 22:44:29  28-JUN-016 22:49:05    7449        161        219
      20          75

This output is from a directly connected EMC VNX5100 FC SAN with 25 disks
using dual 8Gb FC links on a different lab system:

START_TIME         END_TIME            MAX_IOPS   MAX_MBPS  MAX_PMBPS
  LATENCY       DISKS
-------------------- -------------------- ---------- ----------
---------- ---------- ----------
28-JUN-016 22:11:25  28-JUN-016 22:18:48    6487        299        224
      19          75


One of our goals for our Ceph cluster is to replace the EMC SANs.  We've
accomplished this performance wise, the next step is to get a plausible
iSCSI HA solution working.  I'm very interested in what Mike Christie is
putting together.  I'm in the process of vetting the SUSE solution now.

BTW - The tests were run when we had 75 OSDs, which are all 7200RPM 2TB
HDs, across 9 OSD hosts.  We have no SSD journals, instead we have all the
disks setup as single disk RAID1 disk groups with WB cache with BBU.  All
OSD hosts have 40Gb networking and the ESXi hosts have 10G.

Jake


On Mon, Jul 11, 2016 at 12:06 PM, Oliver Dzombic <[email protected]>
wrote:

> Hi Mike,
>
> i was trying:
>
> https://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/
>
> ONE target, from different OSD servers directly, to multiple vmware esxi
> servers.
>
> A config looked like:
>
> #cat iqn.ceph-cluster_netzlaboranten-storage.conf
>
> <target iqn.ceph-cluster:vmware-storage>
> driver iscsi
> bs-type rbd
> backing-store rbd/vmware-storage
> initiator-address 10.0.0.9
> initiator-address 10.0.0.10
> incominguser vmwaren-storage RPb18P0xAqkAw4M1
> </target>
>
>
> We had 4 OSD servers. Everyone had this config running.
> We had 2 vmware servers ( esxi ).
>
> So we had 4 paths to this vmware-storage RBD object.
>
> VMware, in the very end, had 8 paths ( 4 path's directly connected to
> the specific vmware server ) + 4 paths this specific vmware servers saw
> via the other vmware server ).
>
> There were very big problems with performance. I am talking about < 10
> MB/s. So the customer was not able to use it, so good old nfs is serving.
>
> At that time we used ceph hammer, and i think esxi 5.5 the customer was
> using, or maybe esxi 6, was somewhere last year the testing.
>
> --------------------
>
> We will make a new attempt now with ceph jewel and esxi 6 and this time
> we will manage the vmware servers.
>
> As soon as we fixed this
>
> "ceph mon Segmentation fault after set crush_ruleset ceph 10.2.2"
>
> what i already mailed here to the list is solved, we can start the testing.
>
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:[email protected]
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 11.07.2016 um 17:45 schrieb Mike Christie:
> > On 07/08/2016 02:22 PM, Oliver Dzombic wrote:
> >> Hi,
> >>
> >> does anyone have experience how to connect vmware with ceph smart ?
> >>
> >> iSCSI multipath does not really worked well.
> >
> > Are you trying to export rbd images from multiple iscsi targets at the
> > same time or just one target?
> >
> > For the HA/multiple target setup, I am working on this for Red Hat. We
> > plan to release it in RHEL 7.3/RHCS 2.1. SUSE ships something already as
> > someone mentioned.
> >
> > We just got a large chunk of code in the upstream kernel (it is in the
> > block layer maintainer's tree for the next kernel) so it should be
> > simple to add COMPARE_AND_WRITE support now. We should be posting krbd
> > exclusive lock support in the next couple weeks.
> >
> >
> >> NFS could be, but i think thats just too much layers in between to have
> >> some useable performance.
> >>
> >> Systems like ScaleIO have developed a vmware addon to talk with it.
> >>
> >> Is there something similar out there for ceph ?
> >>
> >> What are you using ?
> >>
> >> Thank you !
> >>
> >
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph + vmware

Reply via email to