Hi Nick, yeah i understand the point and message, i wont do it :-)
I just asked me recently how do i test if cache is enabled or not ? What i found requires a client to be connected to an rbd device. But we dont have that. Is there any way to ask ceph server if cache is enabled or not ? Its disabled by config. But by config the default size and min size of newly created pool are different from what ceph really does. -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:[email protected] Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 15.07.2016 um 09:32 schrieb Nick Fisk: >> -----Original Message----- >> From: ceph-users [mailto:[email protected]] On Behalf Of >> Oliver Dzombic >> Sent: 12 July 2016 20:59 >> To: [email protected] >> Subject: Re: [ceph-users] ceph + vmware >> >> Hi Jack, >> >> thank you! >> >> What has reliability to do with rbd_cache = true ? >> >> I mean aside of the fact, that if a host powers down, the "flying" data are >> lost. > > Not reliability, but consistency. As you have touched on the cache is in > volatile memory and you have told tgt that your cache is non-volatile, now if > you have a crash/power outage....etc, then all the data in the cache will be > lost. This will likely leave your RBD full of holes or out of date data. > > If you plan to run HA then this is even more important as you could do a > write on 1 iscsi target and read the data from another before the cache has > flushed. Again corruption, especially if the initiator is doing round robin > over the paths. > > Also when you run HA the chance that TGT will failover to the other node > because of some timeout you normally don't notice, this will also likely > cause serious corruption. > >> >> Are there any special limitations / issues with rbd_cache = true and iscsi >> tgt ? > > I just wouldn't do it. > > You can almost guarantee data corruption if you do. When librbd gets > persistent cache to SSD, this will probably be safe and as long as you can > present the cache device to both nodes (eg dual path SAS), HA should be safe > as well. > >> >> -- >> Mit freundlichen Gruessen / Best regards >> >> Oliver Dzombic >> IP-Interactive >> >> mailto:[email protected] >> >> Anschrift: >> >> IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 >> 63571 Gelnhausen >> >> HRB 93402 beim Amtsgericht Hanau >> Geschäftsführung: Oliver Dzombic >> >> Steuer Nr.: 35 236 3622 1 >> UST ID: DE274086107 >> >> >> Am 11.07.2016 um 22:24 schrieb Jake Young: >>> I'm using this setup with ESXi 5.1 and I get very good performance. I >>> suspect you have other issues. Reliability is another story (see >>> Nick's posts on tgt and HA to get an idea of the awful problems you >>> can have), but for my test labs the risk is acceptable. >>> >>> >>> One change I found helpful is to run tgtd with 128 threads. I'm >>> running Ubuntu 14.04, so I editted my /etc/init.tgt.conf file and >>> changed the line that read: >>> >>> exec tgtd >>> >>> to >>> >>> exec tgtd --nr_iothreads=128 >>> >>> >>> If you're not concerned with reliability, you can enhance throughput >>> even more by enabling rbd client write-back cache in your tgt VM's >>> ceph.conf file (you'll need to restart tgtd for this to take effect): >>> >>> [client] >>> rbd_cache = true >>> rbd_cache_size = 67108864 # (64MB) >>> rbd_cache_max_dirty = 50331648 # (48MB) rbd_cache_target_dirty = >>> 33554432 # (32MB) rbd_cache_max_dirty_age = 2 >>> rbd_cache_writethrough_until_flush = false >>> >>> >>> >>> >>> Here's a sample targets.conf: >>> >>> <target iqn.2014-04.tgt.Charter> >>> initiator-address ALL >>> scsi_sn Charter >>> #vendor_id CEPH >>> #controller_tid 1 >>> write-cache on >>> read-cache on >>> driver iscsi >>> bs-type rbd >>> <backing-store charter/vmguest> >>> lun 5 >>> scsi_id cfe1000c4a71e700506357 >>> </backing-store> >>> <backing-store charter/voting> >>> lun 6 >>> scsi_id cfe1000c4a71e700507157 >>> </backing-store> >>> <backing-store charter/oradata> >>> lun 7 >>> scsi_id cfe1000c4a71e70050da7a >>> </backing-store> >>> <backing-store charter/oraback> >>> lun 8 >>> scsi_id cfe1000c4a71e70050bac0 >>> </backing-store> >>> </target> >>> >>> >>> >>> I don't have FIO numbers handy, but I have some oracle calibrate io >>> output. >>> >>> We're running Oracle RAC database servers in linux VMs on ESXi 5.1, >>> which use iSCSI to connect to the tgt service. I only have a single >>> connection setup in ESXi for each LUN. I tested using multipathing >>> and two tgt VMs presenting identical LUNs/RBD disks, but found that >>> there wasn't a significant performance gain by doing this, even with >>> round-robin path selecting in VMware. >>> >>> >>> These tests were run from two RAC VMs, each on a different host, with >>> both hosts connected to the same tgt instance. The way we have oracle >>> configured, it would have been using two of the LUNs heavily during >>> this calibrate IO test. >>> >>> >>> This output is with 128 threads in tgtd and rbd client cache enabled: >>> >>> START_TIME END_TIME MAX_IOPS MAX_MBPS MAX_PMBPS >>> LATENCY DISKS >>> -------------------- -------------------- ---------- ---------- ---------- >>> ---------- ---------- >>> 28-JUN-016 15:10:50 28-JUN-016 15:20:04 14153 658 412 >>> 14 75 >>> >>> >>> This output is with the same configuration, but with rbd client cache >>> disabled: >>> >>> START_TIME END_TIME MAX_IOPS MAX_MBPS MAX_PMBPS >>> LATENCY DISKS >>> -------------------- -------------------- ---------- ---------- ---------- >>> ---------- ---------- >>> 28-JUN-016 22:44:29 28-JUN-016 22:49:05 7449 161 219 >>> 20 75 >>> >>> This output is from a directly connected EMC VNX5100 FC SAN with 25 >>> disks using dual 8Gb FC links on a different lab system: >>> >>> START_TIME END_TIME MAX_IOPS MAX_MBPS MAX_PMBPS >>> LATENCY DISKS >>> -------------------- -------------------- ---------- ---------- ---------- >>> ---------- ---------- >>> 28-JUN-016 22:11:25 28-JUN-016 22:18:48 6487 299 224 >>> 19 75 >>> >>> >>> One of our goals for our Ceph cluster is to replace the EMC SANs. >>> We've accomplished this performance wise, the next step is to get a >>> plausible iSCSI HA solution working. I'm very interested in what Mike >>> Christie is putting together. I'm in the process of vetting the SUSE >>> solution now. >>> >>> BTW - The tests were run when we had 75 OSDs, which are all 7200RPM >>> 2TB HDs, across 9 OSD hosts. We have no SSD journals, instead we have >>> all the disks setup as single disk RAID1 disk groups with WB cache >>> with BBU. All OSD hosts have 40Gb networking and the ESXi hosts have 10G. >>> >>> Jake >>> >>> >>> On Mon, Jul 11, 2016 at 12:06 PM, Oliver Dzombic >>> <[email protected] <mailto:[email protected]>> wrote: >>> >>> Hi Mike, >>> >>> i was trying: >>> >>> https://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/ >>> >>> ONE target, from different OSD servers directly, to multiple vmware esxi >>> servers. >>> >>> A config looked like: >>> >>> #cat iqn.ceph-cluster_netzlaboranten-storage.conf >>> >>> <target iqn.ceph-cluster:vmware-storage> >>> driver iscsi >>> bs-type rbd >>> backing-store rbd/vmware-storage >>> initiator-address 10.0.0.9 >>> initiator-address 10.0.0.10 >>> incominguser vmwaren-storage RPb18P0xAqkAw4M1 >>> </target> >>> >>> >>> We had 4 OSD servers. Everyone had this config running. >>> We had 2 vmware servers ( esxi ). >>> >>> So we had 4 paths to this vmware-storage RBD object. >>> >>> VMware, in the very end, had 8 paths ( 4 path's directly connected to >>> the specific vmware server ) + 4 paths this specific vmware servers saw >>> via the other vmware server ). >>> >>> There were very big problems with performance. I am talking about < 10 >>> MB/s. So the customer was not able to use it, so good old nfs is >>> serving. >>> >>> At that time we used ceph hammer, and i think esxi 5.5 the customer was >>> using, or maybe esxi 6, was somewhere last year the testing. >>> >>> -------------------- >>> >>> We will make a new attempt now with ceph jewel and esxi 6 and this time >>> we will manage the vmware servers. >>> >>> As soon as we fixed this >>> >>> "ceph mon Segmentation fault after set crush_ruleset ceph 10.2.2" >>> >>> what i already mailed here to the list is solved, we can start the >>> testing. >>> >>> >>> -- >>> Mit freundlichen Gruessen / Best regards >>> >>> Oliver Dzombic >>> IP-Interactive >>> >>> mailto:[email protected] <mailto:[email protected]> >>> >>> Anschrift: >>> >>> IP Interactive UG ( haftungsbeschraenkt ) >>> Zum Sonnenberg 1-3 >>> 63571 Gelnhausen >>> >>> HRB 93402 beim Amtsgericht Hanau >>> Geschäftsführung: Oliver Dzombic >>> >>> Steuer Nr.: 35 236 3622 1 <tel:35%20236%203622%201> >>> UST ID: DE274086107 >>> >>> >>> Am 11.07.2016 um 17:45 schrieb Mike Christie: >>> > On 07/08/2016 02:22 PM, Oliver Dzombic wrote: >>> >> Hi, >>> >> >>> >> does anyone have experience how to connect vmware with ceph smart ? >>> >> >>> >> iSCSI multipath does not really worked well. >>> > >>> > Are you trying to export rbd images from multiple iscsi targets at the >>> > same time or just one target? >>> > >>> > For the HA/multiple target setup, I am working on this for Red Hat. We >>> > plan to release it in RHEL 7.3/RHCS 2.1. SUSE ships something >>> already as >>> > someone mentioned. >>> > >>> > We just got a large chunk of code in the upstream kernel (it is in the >>> > block layer maintainer's tree for the next kernel) so it should be >>> > simple to add COMPARE_AND_WRITE support now. We should be posting krbd >>> > exclusive lock support in the next couple weeks. >>> > >>> > >>> >> NFS could be, but i think thats just too much layers in between >>> to have >>> >> some useable performance. >>> >> >>> >> Systems like ScaleIO have developed a vmware addon to talk with it. >>> >> >>> >> Is there something similar out there for ceph ? >>> >> >>> >> What are you using ? >>> >> >>> >> Thank you ! >>> >> >>> > >>> _______________________________________________ >>> ceph-users mailing list >>> [email protected] <mailto:[email protected]> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
