Re: [ceph-users] ceph + vmware

Jake Young Sat, 16 Jul 2016 09:06:06 -0700

On Saturday, July 16, 2016, Oliver Dzombic <[email protected]> wrote:


> Hi Jake,
>
> thank you very much both was needed, MTU and VAAI deactivated ( i hope
> that wont interfere with vmotion or other features ).
>
> I changed now the MTU of vmkernel and vswitch. That solved this problem.


Try turning VAAI back on at some point.


>
> So i could make an ext4 filesystem and mount it.
>
> Running
>
> dd if=/dev/zero of=/mnt/8G_test bs=4k count=2M conv=fdatasync
>
> Something is strange to me:
>
> The network gets streight 1 Gbit ( maximum connection ) of iscsi bandwidth.
>
> But inside the vm i can only see 40-50MB/s.
>
> I mean replicationsize is 2. So it would be easy to say 1/2 of 1 Gbit =
> 500 Mbit = 40-50MB/s.
>
> But should this reduction not be inside of the ceph cluster ? Which is
> going with 10G network ?
>
> I mean the data are hitting with 1 Gbit the ceph iscsi server. So now
> this is transported to RBD internally by tgt.
> And there its multiplied by 2 ( over the  cluster network which is 10G )
> before the ACK is sended back to iscsi. So the cluster will internally
> duplicate it via 10G. So my expected bandwidth inside the vm should be
> higher than half of the maximum speed.
>
> Is this a wrong understanding of the mechanism ?


The delay is most likely just having to wait for 2 disks to actually do the
write.


>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:[email protected] <javascript:;>
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 16.07.2016 um 02:18 schrieb Jake Young:
> > I had some odd issues like that due to MTU mismatch.
> >
> > Keep in mind that the vSwitch and vmkernel port have independent MTU
> > settings.  Verify you can ping with large size packets without
> > fragmentation between your host and iscsi target.
> >
> > If that's not it, you can try to disable VAAI options to see if one of
> > them is causing issues. I haven't used ESXi 6.0 yet.
> >
> > Jake
> >
> >
> > On Friday, July 15, 2016, Oliver Dzombic <[email protected]
> <javascript:;>
> > <mailto:[email protected] <javascript:;>>> wrote:
> >
> >     Hi,
> >
> >     i am currently trying out the stuff.
> >
> >     My tgt config:
> >
> >     # cat tgtd.conf
> >     # The default config file
> >     include /etc/tgt/targets.conf
> >
> >     # Config files from other packages etc.
> >     include /etc/tgt/conf.d/*.conf
> >
> >     nr_iothreads=128
> >
> >
> >     -----
> >
> >     # cat iqn.2016-07.tgt.esxi-test.conf
> >     <target iqn.2016-07.tgt.esxi-test>
> >       initiator-address ALL
> >       scsi_sn esxi-test
> >       #vendor_id CEPH
> >       #controller_tid 1
> >       write-cache on
> >       read-cache on
> >       driver iscsi
> >       bs-type rbd
> >       <backing-store vmware1/esxi-test>
> >       lun 1
> >       scsi_id cf10000c4a71e700506357
> >       </backing-store>
> >       </target>
> >
> >
> >     --------------
> >
> >
> >     If i create a vm inside esxi 6 and try to format the virtual hdd, i
> see
> >     in logs:
> >
> >     sd:2:0:0:0: [sda] CDB:
> >     Write(10): 2a 00 0f 86 a8 80 00 01 40 00
> >     mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff880068aa5e00)
> >     mptscsih: ioc0: attempting task abort! ( sc=ffff880068aa4a80)
> >
> >     With the LSI HDD emulation. With the vmware paravirtualization
> >     everything just freeze.
> >
> >     Any idea with that issue ?
> >
> >     --
> >     Mit freundlichen Gruessen / Best regards
> >
> >     Oliver Dzombic
> >     IP-Interactive
> >
> >     mailto:[email protected] <javascript:;>
> >
> >     Anschrift:
> >
> >     IP Interactive UG ( haftungsbeschraenkt )
> >     Zum Sonnenberg 1-3
> >     63571 Gelnhausen
> >
> >     HRB 93402 beim Amtsgericht Hanau
> >     Geschäftsführung: Oliver Dzombic
> >
> >     Steuer Nr.: 35 236 3622 1
> >     UST ID: DE274086107
> >
> >
> >     Am 11.07.2016 um 22:24 schrieb Jake Young:
> >     > I'm using this setup with ESXi 5.1 and I get very good
> performance.  I
> >     > suspect you have other issues.  Reliability is another story (see
> >     Nick's
> >     > posts on tgt and HA to get an idea of the awful problems you can
> >     have),
> >     > but for my test labs the risk is acceptable.
> >     >
> >     >
> >     > One change I found helpful is to run tgtd with 128 threads.  I'm
> >     running
> >     > Ubuntu 14.04, so I editted my /etc/init.tgt.conf file and changed
> the
> >     > line that read:
> >     >
> >     > exec tgtd
> >     >
> >     > to
> >     >
> >     > exec tgtd --nr_iothreads=128
> >     >
> >     >
> >     > If you're not concerned with reliability, you can enhance
> throughput
> >     > even more by enabling rbd client write-back cache in your tgt VM's
> >     > ceph.conf file (you'll need to restart tgtd for this to take
> effect):
> >     >
> >     > [client]
> >     > rbd_cache = true
> >     > rbd_cache_size = 67108864 # (64MB)
> >     > rbd_cache_max_dirty = 50331648 # (48MB)
> >     > rbd_cache_target_dirty = 33554432 # (32MB)
> >     > rbd_cache_max_dirty_age = 2
> >     > rbd_cache_writethrough_until_flush = false
> >     >
> >     >
> >     >
> >     >
> >     > Here's a sample targets.conf:
> >     >
> >     >   <target iqn.2014-04.tgt.Charter>
> >     >   initiator-address ALL
> >     >   scsi_sn Charter
> >     >   #vendor_id CEPH
> >     >   #controller_tid 1
> >     >   write-cache on
> >     >   read-cache on
> >     >   driver iscsi
> >     >   bs-type rbd
> >     >   <backing-store charter/vmguest>
> >     >   lun 5
> >     >   scsi_id cfe1000c4a71e700506357
> >     >   </backing-store>
> >     >   <backing-store charter/voting>
> >     >   lun 6
> >     >   scsi_id cfe1000c4a71e700507157
> >     >   </backing-store>
> >     >   <backing-store charter/oradata>
> >     >   lun 7
> >     >   scsi_id cfe1000c4a71e70050da7a
> >     >   </backing-store>
> >     >   <backing-store charter/oraback>
> >     >   lun 8
> >     >   scsi_id cfe1000c4a71e70050bac0
> >     >   </backing-store>
> >     >   </target>
> >     >
> >     >
> >     >
> >     > I don't have FIO numbers handy, but I have some oracle calibrate io
> >     > output.
> >     >
> >     > We're running Oracle RAC database servers in linux VMs on ESXi 5.1,
> >     > which use iSCSI to connect to the tgt service.  I only have a
> single
> >     > connection setup in ESXi for each LUN.  I tested using
> >     multipathing and
> >     > two tgt VMs presenting identical LUNs/RBD disks, but found that
> there
> >     > wasn't a significant performance gain by doing this, even with
> >     > round-robin path selecting in VMware.
> >     >
> >     >
> >     > These tests were run from two RAC VMs, each on a different host,
> with
> >     > both hosts connected to the same tgt instance.  The way we have
> oracle
> >     > configured, it would have been using two of the LUNs heavily
> >     during this
> >     > calibrate IO test.
> >     >
> >     >
> >     > This output is with 128 threads in tgtd and rbd client cache
> enabled:
> >     >
> >     > START_TIME           END_TIME               MAX_IOPS   MAX_MBPS
> >     MAX_PMBPS   LATENCY       DISKS
> >     > -------------------- -------------------- ---------- ----------
> >     ---------- ---------- ----------
> >     > 28-JUN-016 15:10:50  28-JUN-016 15:20:04       14153        658
> >         412       14          75
> >     >
> >     >
> >     > This output is with the same configuration, but with rbd client
> cache
> >     > disabled:
> >     >
> >     > START_TIME         END_TIME            MAX_IOPS   MAX_MBPS
> >     MAX_PMBPS    LATENCY       DISKS
> >     > -------------------- -------------------- ---------- ----------
> >     ---------- ---------- ----------
> >     > 28-JUN-016 22:44:29  28-JUN-016 22:49:05    7449        161
> >     219       20          75
> >     >
> >     > This output is from a directly connected EMC VNX5100 FC SAN with 25
> >     > disks using dual 8Gb FC links on a different lab system:
> >     >
> >     > START_TIME         END_TIME            MAX_IOPS   MAX_MBPS
> >     MAX_PMBPS    LATENCY       DISKS
> >     > -------------------- -------------------- ---------- ----------
> >     ---------- ---------- ----------
> >     > 28-JUN-016 22:11:25  28-JUN-016 22:18:48    6487        299
> >     224       19          75
> >     >
> >     >
> >     > One of our goals for our Ceph cluster is to replace the EMC SANs.
> >     We've
> >     > accomplished this performance wise, the next step is to get a
> >     plausible
> >     > iSCSI HA solution working.  I'm very interested in what Mike
> >     Christie is
> >     > putting together.  I'm in the process of vetting the SUSE solution
> >     now.
> >     >
> >     > BTW - The tests were run when we had 75 OSDs, which are all
> >     7200RPM 2TB
> >     > HDs, across 9 OSD hosts.  We have no SSD journals, instead we have
> all
> >     > the disks setup as single disk RAID1 disk groups with WB cache with
> >     > BBU.  All OSD hosts have 40Gb networking and the ESXi hosts have
> 10G.
> >     >
> >     > Jake
> >     >
> >     >
> >     > On Mon, Jul 11, 2016 at 12:06 PM, Oliver Dzombic
> >     <[email protected] <javascript:;>
> >     > <mailto:[email protected] <javascript:;>>> wrote:
> >     >
> >     >     Hi Mike,
> >     >
> >     >     i was trying:
> >     >
> >     >     https://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/
> >     >
> >     >     ONE target, from different OSD servers directly, to multiple
> >     vmware esxi
> >     >     servers.
> >     >
> >     >     A config looked like:
> >     >
> >     >     #cat iqn.ceph-cluster_netzlaboranten-storage.conf
> >     >
> >     >     <target iqn.ceph-cluster:vmware-storage>
> >     >     driver iscsi
> >     >     bs-type rbd
> >     >     backing-store rbd/vmware-storage
> >     >     initiator-address 10.0.0.9
> >     >     initiator-address 10.0.0.10
> >     >     incominguser vmwaren-storage RPb18P0xAqkAw4M1
> >     >     </target>
> >     >
> >     >
> >     >     We had 4 OSD servers. Everyone had this config running.
> >     >     We had 2 vmware servers ( esxi ).
> >     >
> >     >     So we had 4 paths to this vmware-storage RBD object.
> >     >
> >     >     VMware, in the very end, had 8 paths ( 4 path's directly
> >     connected to
> >     >     the specific vmware server ) + 4 paths this specific vmware
> >     servers saw
> >     >     via the other vmware server ).
> >     >
> >     >     There were very big problems with performance. I am talking
> >     about < 10
> >     >     MB/s. So the customer was not able to use it, so good old nfs
> is
> >     >     serving.
> >     >
> >     >     At that time we used ceph hammer, and i think esxi 5.5 the
> >     customer was
> >     >     using, or maybe esxi 6, was somewhere last year the testing.
> >     >
> >     >     --------------------
> >     >
> >     >     We will make a new attempt now with ceph jewel and esxi 6 and
> >     this time
> >     >     we will manage the vmware servers.
> >     >
> >     >     As soon as we fixed this
> >     >
> >     >     "ceph mon Segmentation fault after set crush_ruleset ceph
> 10.2.2"
> >     >
> >     >     what i already mailed here to the list is solved, we can start
> the
> >     >     testing.
> >     >
> >     >
> >     >     --
> >     >     Mit freundlichen Gruessen / Best regards
> >     >
> >     >     Oliver Dzombic
> >     >     IP-Interactive
> >     >
> >     >     mailto:[email protected] <javascript:;> <mailto:
> [email protected] <javascript:;>>
> >     >
> >     >     Anschrift:
> >     >
> >     >     IP Interactive UG ( haftungsbeschraenkt )
> >     >     Zum Sonnenberg 1-3
> >     >     63571 Gelnhausen
> >     >
> >     >     HRB 93402 beim Amtsgericht Hanau
> >     >     Geschäftsführung: Oliver Dzombic
> >     >
> >     >     Steuer Nr.: 35 236 3622 1 <tel:35%20236%203622%201>
> >     >     UST ID: DE274086107
> >     >
> >     >
> >     >     Am 11.07.2016 um 17:45 schrieb Mike Christie:
> >     >     > On 07/08/2016 02:22 PM, Oliver Dzombic wrote:
> >     >     >> Hi,
> >     >     >>
> >     >     >> does anyone have experience how to connect vmware with ceph
> >     smart ?
> >     >     >>
> >     >     >> iSCSI multipath does not really worked well.
> >     >     >
> >     >     > Are you trying to export rbd images from multiple iscsi
> >     targets at the
> >     >     > same time or just one target?
> >     >     >
> >     >     > For the HA/multiple target setup, I am working on this for
> >     Red Hat. We
> >     >     > plan to release it in RHEL 7.3/RHCS 2.1. SUSE ships something
> >     >     already as
> >     >     > someone mentioned.
> >     >     >
> >     >     > We just got a large chunk of code in the upstream kernel (it
> >     is in the
> >     >     > block layer maintainer's tree for the next kernel) so it
> >     should be
> >     >     > simple to add COMPARE_AND_WRITE support now. We should be
> >     posting krbd
> >     >     > exclusive lock support in the next couple weeks.
> >     >     >
> >     >     >
> >     >     >> NFS could be, but i think thats just too much layers in
> between
> >     >     to have
> >     >     >> some useable performance.
> >     >     >>
> >     >     >> Systems like ScaleIO have developed a vmware addon to talk
> >     with it.
> >     >     >>
> >     >     >> Is there something similar out there for ceph ?
> >     >     >>
> >     >     >> What are you using ?
> >     >     >>
> >     >     >> Thank you !
> >     >     >>
> >     >     >
> >     >     _______________________________________________
> >     >     ceph-users mailing list
> >     >     [email protected] <javascript:;> <mailto:
> [email protected] <javascript:;>>
> >     >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >     >
> >     >
> >     _______________________________________________
> >     ceph-users mailing list
> >     [email protected] <javascript:;>
> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> [email protected] <javascript:;>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph + vmware

Reply via email to