Re: [ceph-users] ceph + vmware

Oliver Dzombic Fri, 15 Jul 2016 00:36:02 -0700

Hi Nick,

yeah i understand the point and message, i wont do it :-)


I just asked me recently how do i test if cache is enabled or not ?

What i found requires a client to be connected to an rbd device. But we
dont have that.

Is there any way to ask ceph server if cache is enabled or not ? Its
disabled by config. But by config the default size and min size of newly
created pool are different from what ceph really does.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:[email protected]

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 15.07.2016 um 09:32 schrieb Nick Fisk:
>> -----Original Message-----
>> From: ceph-users [mailto:[email protected]] On Behalf Of 
>> Oliver Dzombic
>> Sent: 12 July 2016 20:59
>> To: [email protected]
>> Subject: Re: [ceph-users] ceph + vmware
>>
>> Hi Jack,
>>
>> thank you!
>>
>> What has reliability to do with rbd_cache = true ?
>>
>> I mean aside of the fact, that if a host powers down, the "flying" data are 
>> lost.
> 
> Not reliability, but consistency. As you have touched on the cache is in 
> volatile memory and you have told tgt that your cache is non-volatile, now if 
> you have a crash/power outage....etc, then all the data in the cache will be 
> lost. This will likely leave your RBD full of holes or out of date data.
> 
> If you plan to run HA then this is even more important as you could do a 
> write on 1 iscsi target and read the data from another before the cache has 
> flushed. Again corruption, especially if the initiator is doing round robin 
> over the paths.
> 
> Also when you run HA the chance that TGT will failover to the other node 
> because of some timeout you normally don't notice, this will also likely 
> cause serious corruption. 
> 
>>
>> Are there any special limitations / issues with rbd_cache = true and iscsi 
>> tgt ?
> 
> I just wouldn't do it. 
> 
> You can almost guarantee data corruption if you do. When librbd gets 
> persistent cache to SSD, this will probably be safe and as long as you can 
> present the cache device to both nodes (eg dual path SAS), HA should be safe 
> as well.
> 
>>
>> --
>> Mit freundlichen Gruessen / Best regards
>>
>> Oliver Dzombic
>> IP-Interactive
>>
>> mailto:[email protected]
>>
>> Anschrift:
>>
>> IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3
>> 63571 Gelnhausen
>>
>> HRB 93402 beim Amtsgericht Hanau
>> Geschäftsführung: Oliver Dzombic
>>
>> Steuer Nr.: 35 236 3622 1
>> UST ID: DE274086107
>>
>>
>> Am 11.07.2016 um 22:24 schrieb Jake Young:
>>> I'm using this setup with ESXi 5.1 and I get very good performance.  I
>>> suspect you have other issues.  Reliability is another story (see
>>> Nick's posts on tgt and HA to get an idea of the awful problems you
>>> can have), but for my test labs the risk is acceptable.
>>>
>>>
>>> One change I found helpful is to run tgtd with 128 threads.  I'm
>>> running Ubuntu 14.04, so I editted my /etc/init.tgt.conf file and
>>> changed the line that read:
>>>
>>> exec tgtd
>>>
>>> to
>>>
>>> exec tgtd --nr_iothreads=128
>>>
>>>
>>> If you're not concerned with reliability, you can enhance throughput
>>> even more by enabling rbd client write-back cache in your tgt VM's
>>> ceph.conf file (you'll need to restart tgtd for this to take effect):
>>>
>>> [client]
>>> rbd_cache = true
>>> rbd_cache_size = 67108864 # (64MB)
>>> rbd_cache_max_dirty = 50331648 # (48MB) rbd_cache_target_dirty =
>>> 33554432 # (32MB) rbd_cache_max_dirty_age = 2
>>> rbd_cache_writethrough_until_flush = false
>>>
>>>
>>>
>>>
>>> Here's a sample targets.conf:
>>>
>>>   <target iqn.2014-04.tgt.Charter>
>>>   initiator-address ALL
>>>   scsi_sn Charter
>>>   #vendor_id CEPH
>>>   #controller_tid 1
>>>   write-cache on
>>>   read-cache on
>>>   driver iscsi
>>>   bs-type rbd
>>>   <backing-store charter/vmguest>
>>>   lun 5
>>>   scsi_id cfe1000c4a71e700506357
>>>   </backing-store>
>>>   <backing-store charter/voting>
>>>   lun 6
>>>   scsi_id cfe1000c4a71e700507157
>>>   </backing-store>
>>>   <backing-store charter/oradata>
>>>   lun 7
>>>   scsi_id cfe1000c4a71e70050da7a
>>>   </backing-store>
>>>   <backing-store charter/oraback>
>>>   lun 8
>>>   scsi_id cfe1000c4a71e70050bac0
>>>   </backing-store>
>>>   </target>
>>>
>>>
>>>
>>> I don't have FIO numbers handy, but I have some oracle calibrate io
>>> output.
>>>
>>> We're running Oracle RAC database servers in linux VMs on ESXi 5.1,
>>> which use iSCSI to connect to the tgt service.  I only have a single
>>> connection setup in ESXi for each LUN.  I tested using multipathing
>>> and two tgt VMs presenting identical LUNs/RBD disks, but found that
>>> there wasn't a significant performance gain by doing this, even with
>>> round-robin path selecting in VMware.
>>>
>>>
>>> These tests were run from two RAC VMs, each on a different host, with
>>> both hosts connected to the same tgt instance.  The way we have oracle
>>> configured, it would have been using two of the LUNs heavily during
>>> this calibrate IO test.
>>>
>>>
>>> This output is with 128 threads in tgtd and rbd client cache enabled:
>>>
>>> START_TIME           END_TIME               MAX_IOPS   MAX_MBPS  MAX_PMBPS  
>>>  LATENCY       DISKS
>>> -------------------- -------------------- ---------- ---------- ---------- 
>>> ---------- ----------
>>> 28-JUN-016 15:10:50  28-JUN-016 15:20:04       14153        658        412  
>>>      14          75
>>>
>>>
>>> This output is with the same configuration, but with rbd client cache
>>> disabled:
>>>
>>> START_TIME         END_TIME            MAX_IOPS   MAX_MBPS  MAX_PMBPS    
>>> LATENCY       DISKS
>>> -------------------- -------------------- ---------- ---------- ---------- 
>>> ---------- ----------
>>> 28-JUN-016 22:44:29  28-JUN-016 22:49:05    7449        161        219      
>>>  20          75
>>>
>>> This output is from a directly connected EMC VNX5100 FC SAN with 25
>>> disks using dual 8Gb FC links on a different lab system:
>>>
>>> START_TIME         END_TIME            MAX_IOPS   MAX_MBPS  MAX_PMBPS    
>>> LATENCY       DISKS
>>> -------------------- -------------------- ---------- ---------- ---------- 
>>> ---------- ----------
>>> 28-JUN-016 22:11:25  28-JUN-016 22:18:48    6487        299        224      
>>>  19          75
>>>
>>>
>>> One of our goals for our Ceph cluster is to replace the EMC SANs.
>>> We've accomplished this performance wise, the next step is to get a
>>> plausible iSCSI HA solution working.  I'm very interested in what Mike
>>> Christie is putting together.  I'm in the process of vetting the SUSE 
>>> solution now.
>>>
>>> BTW - The tests were run when we had 75 OSDs, which are all 7200RPM
>>> 2TB HDs, across 9 OSD hosts.  We have no SSD journals, instead we have
>>> all the disks setup as single disk RAID1 disk groups with WB cache
>>> with BBU.  All OSD hosts have 40Gb networking and the ESXi hosts have 10G.
>>>
>>> Jake
>>>
>>>
>>> On Mon, Jul 11, 2016 at 12:06 PM, Oliver Dzombic
>>> <[email protected] <mailto:[email protected]>> wrote:
>>>
>>>     Hi Mike,
>>>
>>>     i was trying:
>>>
>>>     https://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/
>>>
>>>     ONE target, from different OSD servers directly, to multiple vmware esxi
>>>     servers.
>>>
>>>     A config looked like:
>>>
>>>     #cat iqn.ceph-cluster_netzlaboranten-storage.conf
>>>
>>>     <target iqn.ceph-cluster:vmware-storage>
>>>     driver iscsi
>>>     bs-type rbd
>>>     backing-store rbd/vmware-storage
>>>     initiator-address 10.0.0.9
>>>     initiator-address 10.0.0.10
>>>     incominguser vmwaren-storage RPb18P0xAqkAw4M1
>>>     </target>
>>>
>>>
>>>     We had 4 OSD servers. Everyone had this config running.
>>>     We had 2 vmware servers ( esxi ).
>>>
>>>     So we had 4 paths to this vmware-storage RBD object.
>>>
>>>     VMware, in the very end, had 8 paths ( 4 path's directly connected to
>>>     the specific vmware server ) + 4 paths this specific vmware servers saw
>>>     via the other vmware server ).
>>>
>>>     There were very big problems with performance. I am talking about < 10
>>>     MB/s. So the customer was not able to use it, so good old nfs is
>>>     serving.
>>>
>>>     At that time we used ceph hammer, and i think esxi 5.5 the customer was
>>>     using, or maybe esxi 6, was somewhere last year the testing.
>>>
>>>     --------------------
>>>
>>>     We will make a new attempt now with ceph jewel and esxi 6 and this time
>>>     we will manage the vmware servers.
>>>
>>>     As soon as we fixed this
>>>
>>>     "ceph mon Segmentation fault after set crush_ruleset ceph 10.2.2"
>>>
>>>     what i already mailed here to the list is solved, we can start the
>>>     testing.
>>>
>>>
>>>     --
>>>     Mit freundlichen Gruessen / Best regards
>>>
>>>     Oliver Dzombic
>>>     IP-Interactive
>>>
>>>     mailto:[email protected] <mailto:[email protected]>
>>>
>>>     Anschrift:
>>>
>>>     IP Interactive UG ( haftungsbeschraenkt )
>>>     Zum Sonnenberg 1-3
>>>     63571 Gelnhausen
>>>
>>>     HRB 93402 beim Amtsgericht Hanau
>>>     Geschäftsführung: Oliver Dzombic
>>>
>>>     Steuer Nr.: 35 236 3622 1 <tel:35%20236%203622%201>
>>>     UST ID: DE274086107
>>>
>>>
>>>     Am 11.07.2016 um 17:45 schrieb Mike Christie:
>>>     > On 07/08/2016 02:22 PM, Oliver Dzombic wrote:
>>>     >> Hi,
>>>     >>
>>>     >> does anyone have experience how to connect vmware with ceph smart ?
>>>     >>
>>>     >> iSCSI multipath does not really worked well.
>>>     >
>>>     > Are you trying to export rbd images from multiple iscsi targets at the
>>>     > same time or just one target?
>>>     >
>>>     > For the HA/multiple target setup, I am working on this for Red Hat. We
>>>     > plan to release it in RHEL 7.3/RHCS 2.1. SUSE ships something
>>>     already as
>>>     > someone mentioned.
>>>     >
>>>     > We just got a large chunk of code in the upstream kernel (it is in the
>>>     > block layer maintainer's tree for the next kernel) so it should be
>>>     > simple to add COMPARE_AND_WRITE support now. We should be posting krbd
>>>     > exclusive lock support in the next couple weeks.
>>>     >
>>>     >
>>>     >> NFS could be, but i think thats just too much layers in between
>>>     to have
>>>     >> some useable performance.
>>>     >>
>>>     >> Systems like ScaleIO have developed a vmware addon to talk with it.
>>>     >>
>>>     >> Is there something similar out there for ceph ?
>>>     >>
>>>     >> What are you using ?
>>>     >>
>>>     >> Thank you !
>>>     >>
>>>     >
>>>     _______________________________________________
>>>     ceph-users mailing list
>>>     [email protected] <mailto:[email protected]>
>>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph + vmware

Reply via email to