Re: [storage-discuss] iSCSI and low performance
Hello, I'm having some issues with iSCSI target performance. Recently I made a 1TB ZVOL, mounted it on windows 7 Ultimate (NTFS) with Microsoft's iSCSI initiator. But the performance, in layman's terms, just sucks. Version is SunOS solaris 5.11 snv_123 i86pc i386 i86pc Athlon64 2800+, 4x500GB SATA2 drives (WD Caviar Green) in raidz, 3GB RAM. Onboard nvidia sata and gigabit ethernet. Amount of memory is rather low for zfs. So, two questions: 1) Why does the shareiscsi=on option create the target with rdsk if it is much slower? 2) Any suggestions for improving performance? Thanks in advance, Hernan shareiscsi is outdated, you should use much advanced and improved COMSTAR. Don't forget to check if writeback is enabled on LUN - writethrough mode is too slow usually. -- Roman ro...@naumenko.ca -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] iSCSI and low performance
Thanks. How much memory should I have? This machine wont take more than 3GB, best I can get is one that takes up to 8GB. Anyway, I'm the only user, is it really necessary? It depends on your needs. If you are ok with the current performance, you can go ahead I guess. Reads would do better since there is more space for cache. Tried COMSTAR as you suggested, got the same speed and behavior as iscsitadm with dsk instead of rdsk. This is what writes look like: [r...@solaris:~]# zpool iostat 3 capacity operationsbandwidth ed avail read write read write -- - - - - - - tera2.13T 1.49T 30263 238K 2.25M -- - - - - - - tera2.13T 1.49T 0 6.42K 0 51.0M -- - - - - - - tera2.13T 1.49T 0 0 0 0 -- - - - - - - tera2.13T 1.49T 0 0 2.64K 0 -- - - - - - - tera2.13T 1.49T 0 8.13K 0 64.6M -- - - - - - - tera2.13T 1.49T 0 0 2.64K 0 -- - - - - - - tera2.13T 1.49T 0 0 2.65K 0 -- - - - - - - tera2.13T 1.49T 0 8.23K 0 65.4M -- - - - - - - if I disable writeback, it looks like this: tera2.13T 1.49T 0 44 0 2.89M -- - - - - - - tera2.13T 1.49T 0 35 2.66K 2.29M -- - - - - - - tera2.13T 1.49T 0 46 0 3.01M -- - - - - - - tera2.13T 1.49T 0 22 0 1.48M -- - - - - - - tera2.13T 1.49T 0 47 0 3.06M -- - - - - - - tera2.13T 1.49T 0 22 0 1.45M -- - - - - - - tera2.13T 1.49T 0 40 0 2.60M -- - - - - - - So why is it so bursty? ZIL maybe? Do I have a bottleneck somewhere? It's a correct behavior. I think this is like a limit for your current configuration. There is a thread in zfs forum about this burst (http://www.opensolaris.org/jive/thread.jspa?messageID=446100tstart=0#446100) You can decrease timeout for writes, but I doubt it will help. Probably the array is not able to do better (check the maximum throughput with dd command). You can also check iscsi speed by using this scipts (iscsii.d) from http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_iSCSI Basically, you'll see the same speed as on windows task manager in Network tab. You can definitely increase array performance by configuring it in raid10. -- Roman ro...@naumenko.ca -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Even with host groups, other inititiators still see the targ
If I remember correctly, initiator that is not in a view still can establish session, but it doesn't get access to LUNs. Particularly, you can login to a target from Windows, you will see the session opened in Solaris, but Win box won't see any iscsi drive. There is a bug related to your question: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6878539 -- Roman Naumenko ro...@naumenko.ca Message was edited by: rokka -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] failover avs?
Hi Greg, Your setup is not very clear from the description... But probably you should try snapshots for data syncronization. iscsi targets have to be recreated in any case on the second server. -- Roman Naumenko ro...@naumenko.ca -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] (warning) packet length greater than MTU in buffer offset 8320: length=8320
I have a problem on storage server related to network. This is on OpenSolaris 121b, on the other side is Windows file server accessing it through iscsi. When a user is hammering storage server by copying files back and forth, other users connected to Windows file server are experiencing slowness. r...@torgenzsan.local:/export/home/roman/zfs/iscsi_watch# dladm show-ether LINKPTYPESTATEAUTO SPEED-DUPLEXPAUSE e1000g0 current up yes 1G-fbi e1000g1 current up yes 1G-fbi e1000g2 current up yes 1G-fbi dladm show-link LINKCLASSMTUSTATEOVER e1000g0 phys 1500 up -- e1000g1 phys 1500 up -- e1000g2 phys 1500 up -- ifconfig -a lo0: flags=2001000849UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL mtu 8232 index 1 inet 127.0.0.1 netmask ff00 e1000g0: flags=1000843UP,BROADCAST,RUNNING,MULTICAST,IPv4 mtu 1500 index 2 inet 10.24.1.101 netmask ff00 broadcast 10.24.1.255 ether 0:15:17:87:c9:bc e1000g1: flags=1000843UP,BROADCAST,RUNNING,MULTICAST,IPv4 mtu 1500 index 3 inet 10.24.254.101 netmask ff00 broadcast 10.24.254.255 ether 0:15:17:87:c9:bd e1000g2: flags=1000843UP,BROADCAST,RUNNING,MULTICAST,IPv4 mtu 1500 index 4 inet 10.21.3.101 netmask broadcast 10.21.255.255 ether 0:11:a:54:6e:66 snoop -r -o 20112009.cap -d e1000g0 29135 (warning) packet length greater than MTU in buffer offset 0: length=8320 29136 (warning) packet length greater than MTU in buffer offset 8320: length=8320 29137 (warning) packet length greater than MTU in buffer offset 16640: length=8320 29138 (warning) packet length greater than MTU in buffer offset 24960: length=8320 29139 (warning) packet length greater than MTU in buffer offset 33280: length=8320 29141 (warning) packet length greater than MTU in buffer offset 41688: length=8320 29142 (warning) packet length greater than MTU in buffer offset 50008: length=8320 29160 (warning) packet length greater than MTU in buffer offset 0: length=8320 29163 (warning) packet length greater than MTU in buffer offset 8496: length=8320 29180 (warning) packet length greater than MTU in buffer offset 18304: length=8320 29181 (warning) packet length greater than MTU in buffer offset 26624: length=8320 29182 (warning) packet length greater than MTU in buffer offset 34944: length=8320 29183 (warning) packet length greater than MTU in buffer offset 43264: length=8320 29184 (warning) packet length greater than MTU in buffer offset 51584: length=8320 And so on... In Wireshark packets sizes go up to 56741 (49927,41687,33495,16697,9382) Among bigger packages mostly 8294 seen. What I'm thinking about is that Intel Intel I/OAT engine can help screwing things up - e1000g0 is Intel one. (I remember there was a command to check BIOS setting). If anybody is willing to take a look on trace file, please let me know at ro...@frontline.ca -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] COMSTAR: lun become unregistered after power loss
This is opensolaris 118b We had power loss on the server. After it booted, one of the LUNs was in unregistered state. Is this a bug? stmfadm list-lu -v LU Name: 600144F0EE6002004AA868040001 Operational Status: unregistered Provider Name : unregistered Alias : - View Entry Count : 1 Views were also mess, since the numeration for LUNs shifted. When I created views I put -n X in the stmfadm add-view command. Could it affect how targets are mapped to LUNs? zsan.local:~# stmfadm list-view -l 600144f0ee6002004aa86b0a0002 View Entry: 0 Host group : zsan-hg Target group : tg-tor-fs LUN : 2 zsan.local:~# stmfadm list-view -l 600144f0ee6002004b0027ba0001 View Entry: 0 Host group : zsan-hg Target group : tg-mtl-fs LUN : 3 zsan.local:~# stmfadm list-view -l 600144f0ee6002004ab7e2810004 View Entry: 0 Host group : zsan-hg Target group : tg-mon-exch-01 LUN : 2 2 views have the same LUN, number 2, although the GUID is different. Is this correct configuration? Thank you, Roman Naumenko ro...@frontline.ca -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] COMSTAR: lun become unregistered after power loss
Ok, thanks - with targets behavior it's clear. Nothing unexpected happened in term of connectivity. Still, what could make LUN become unregistered? Is this because of an error on zfs level (no errors on zpool though) or something failed in COMSTAR? -- Roman Naumenko ro...@frontline.ca -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] [sun.com] broken link
You guys have a broken link, people might be interested in X25 :) The page is http://www.sun.com/software/x25/ The link is: http://store.sun.com/CMTemplate/CEServlet?process=SunStorecmdViewProduct_CPcatid=95146 -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Comstar - how to migrate to the new server?
Jim Dunham wrote, On 20.10.2009 09:26: Do you have a backup policy, other than replicating the data from server1 to server2? Of course we have this data in the other places. Windows replicates it with own means plus tape backups. I wouldn't touch it though, if not performance issues. I haven't try disabling AVS yet. But raid10 is the best option definitely. Are server1 and server2 identical systems, such that if you configured server2's ZFS storage pool to raid10, then using ZFS (or AVS) replicated the data from server1 to server2, could the identity of server1 and server2 be swapped? They are identical in terms of underlaying storage (same controller, the same drives). The server configuration is slightly different. My concern is that in any type of data conversion like this, to always have two or more locations where valid data exists. When swapping drives the only one location is presented. That's the problem, yes. -- Roman ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] Comstar - how to migrate to the new server?
Hello list, Can somebody advise me how this can be done in the easiest way: server1: 8drives in raid6 server2: 8drivers replicated by AVS from server1. There are comstar targets on server1. Now I want to have raid10 insted raid6 on server1. The way I see it can be done: brake avs, configure raid10 on server2, copy incremental snapshots and then pause applications on initiators. Reboot server1 while swapping drives between them. The question now is how to restore (if even possible) comstar stuff: LUNs, views. It would be nice to have server1 that comes up after reboot with everything available as before. If this is possible or comstar should be reconfigured? -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Comstar - how to migrate to the new server?
Hi Jim, AVS is configured for async replication. Basically I would like to replace AVS with snapshots transfer. The reason is the performance. So, after I stop replication and configure raid10 on the second server, AVS going to be disabled. Both servers are 122 version (which should be updated to the 124 due to the zfs bug). There is plenty of space left on the server 1, raid6 - raid10 should be ok #zpool list stor 10.9T 3.66T 7.21T33% ONLINE - pool: stor state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM stor ONLINE 0 0 0 raidz2ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 spares c4t1d0AVAIL errors: No known data errors #zfs list stor6.09T 1.91T 49.4K /stor stor/testsnap46.0K 1.91T 46.0K /stor/testsnap stor/tor 6.09T 1.91T 44.9K /stor/tor stor/tor/node1 600G 1.91T 600G - stor/tor/node2 600G 1.91T 600G - stor/tor/fs 4.43T 4.90T 1.35T - stor/tor/dc 500G 2.28T 128G - -- Roman Jim Dunham wrote, On 19.10.2009 14:07: Roman, Hello list, Can somebody advise me how this can be done in the easiest way: server1: 8drives in raid6 server2: 8drivers replicated by AVS from server1. There are comstar targets on server1. Now I want to have raid10 insted raid6 on server1. The way I see it can be done: brake avs, configure raid10 on server2, copy incremental snapshots and then pause applications on initiators. Reboot server1 while swapping drives between them. The question now is how to restore (if even possible) comstar stuff: LUNs, views. It would be nice to have server1 that comes up after reboot with everything available as before. If this is possible or comstar should be reconfigured? As one that has detailed knowledge of both AVS and COMSTAR, you have left out too many details as to offer up a possible solution that will work. What is needed is Solaris version numbers, volume management information, on disk formatting, partitioning and filesystems in use, plus AVS and COMSTAR configuration information. Also changing the underlying volume format from raid6 to raid10, is likely to result in less available disk blocks, so a block-to-block copy, replication or snapshot from a raid6 to raid10 volume is likely to fail due to insufficient storage. -- Roman -- This message posted from opensolaris.org http://opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org mailto:storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss * *Jim Dunham * Engineering Manager Core I/O - COMSTAR Open Storage Systems Group * ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] sata controller support with 1.5TB disks
Before I order the controller and disks, I just want to make sure that Opensolaris will be able to see and use these 1.5TB disks. I don't want to find out that the controller works, but it can't see this large sized disk. Can anyone enlighten me please ? Cheap models from Adaptec don't support large drives, I encountered in this issue. LSI is a good choice, probably everything from them that are on PCI-e supports large drives. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] sata controller support with 1.5TB disks
I am using a Dell SAS 6i card with the Samsung 1.5Tb drives, which is based on an LSI design. I would recommend the SAS 6i as it is significantly cheaper than LSI's normal retail channel cards. Don't Dell screw them up? I don't mean completely, just a little - broken driver here, unsupported hardware there? -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] IOPS number - how much your storage delivers?
I just wonder how much IOPS would typical 2 processor box can deliver with 1 RAID and 1 LSI HBA controller with JBODs connected. Now dd makes 2000-4000 IOSP on 8 disks raid10 array connected internally (not jbod). How to translate iops number into initiator usage? I mean how many servers can be connected on 1G NICs to such storage? Or how many exchange server can utilize such storage, accessing through 1G nic over iscsi? Just trying to plan storage and performance capacity ahead. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] Comstart - relationship between luns and targets
Stupid situation... Usually I create 1 lun + 1 target at time. But then I decided to create few LUNs. Then I typed: itadm create-target It created a target, but then I realized I can't find which LUNs it belongs. Does anybody knows how to find relationship LUNs-Targets in comstart? -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Comstart - relationship between luns and targets
I see the relation: only after the view is created, you'll find in the stmfadm list-lu -v that the lun has a view. Very inconvenient. Comstar developers, could you comment on this? Maybe I'm missing something? -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Comstart - relationship between luns and targets
tim szeto wrote, On 09.10.2009 13:51: Roman Naumenko wrote: I see the relation: only after the view is created, you'll find in the stmfadm list-lu -v that the lun has a view. Very inconvenient. Comstar developers, could you comment on this? Maybe I'm missing something? Take a look at stmfadm(1M), it shows the usage of 'view'. There is no view yet I can configure any view without knowing which target belong each volume. -- Roman ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] Migration from Linux iscsitarget to Comstar, how?
We're about to start transferring data from Linux storage boxes to OpenSolaris storage. Linux servers provide targets that are used mainly by Windows servers. The storage appliance is Openfiler (Linux based). Media storage on Linux is a hardware raid5, 8 drives. There is one big 5TB GPT partition divided into volumes by LVM (and openfiler uses LVM to manage size and so on). Is there is an easy way to transfer data from old volumes to comstar targets? Obviously this is can be done by mounting two targets and copying data on client, which is not efficient. The volumes can be mounted on opensolaris over iscsi, no problems here, but how the data can be transferred from the old volume to zfs? -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Migration from Linux iscsitarget to Comstar, how?
Is there is an easy way to transfer data from old volumes to comstar targets? Obviously this is can be done by mounting two targets and copying data on client, which is not efficient. I'm not sure you have an easy way to do this. The problem as I see it is the LVM'd volumes. You have a couple of ways of doing this, but by far the easiest way is to mount the new LUN on the server and copy the data. It will involve the least downtime and ensures that your Comstar targets are already setup correctly. Yep, seems like this is the easiest way to do switchover. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Migration from Linux iscsitarget to Comstar, how?
The other way is to mount the old LUN on the OpenSolaris server and dd the old data directly to the ZVOL. This will avoid the copy-off copy-on the network and could speed things up depending on the size of the volume to the amount of data being copied, but you can choose your block size which can speed things up. -Ross I wonder how mounting target from Linux server works on Opensolaris? Reliable? Any issues? Is any interaction possible with comstar already enabled? -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Migration from Linux iscsitarget to Comstar, how?
http://blogs.sun.com/eschrock/entry/shadow_migration Not sure if it's relevant in your specific setup, but is still worth a look. Regards, Andrey Thanks, I checked it quickly. unfortunately, this is about migration using nfs. No iscsi option. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] [snapshots] takes minutes for zfs to list snapshots
This is on opensolaris 118 time zfs list real0m0.015s user0m0.005s sys 0m0.010s time zfs list -t snapshot real0m19.441s user0m0.020s sys 0m0.041s time zfs list -t snapshot | wc -l 122 real0m0.045s user0m0.018s sys 0m0.030s Hm, then it started to list very quick again. What makes it to do listing for such a long time, 20 seconds for a hundred snapshots? Does making snapshot could delay listing? (I have autosnapshot service for 30 mins snapshots) -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] setting up comstar
And here goes mine as well: http://opensolaris.org/jive/thread.jspa?threadID=111540tstart=0 Enjoy! -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] convert common chassis to JBOD?
Chenbro makes a JBOD kit. Or at least *made*. A number of sites are showing out of stock or even discontinued. http://usa.chenbro.com/corporatesite/products_detail.p hp?sku=76 UEK-12803 looks like the part number for you. Who knows if the mounting options are compatible with the Supermicro chassis but it looks like it should at least fit in the motherboard I/O window and comes with SAS expanders based around the LSI SASX28. It doesn't make sense to use such external expanders since chassis with expander/without are only different in having small LSI chip on the backplane (20$ ?) As I can see, Supermicro chassis are just little more expensive with sas expander on backplane that without. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] convert common chassis to JBOD?
Chris Du wrote, On 23.12.-28158 14:59: JBOD Kit-- Used for cascading purposes CSE-PTJBOD-CB1 - Power Control Card CBL-0166L - SAS 836EL2/EL1 BP External Cascading Cable CBL-0167L - SAS 836EL1 BP 1-Port Internal Cascading Cable Yours is not E1 model which uses SAS expander chip on the backplane. The Power Control Card should be universal. Without sas expander's chip this kit is useless as I understand. What are other solution for JBOD expanders except for chassis with backplanes sas chips like you mentioned? I see that Areca sells something (ARC-8020 expander module). Well, I suspect the only solution here is a specialized jbod chassis, right? -- Roman ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] convert common chassis to JBOD?
On Tue, Sep 22, 2009 at 15:15, Chris Du dilid...@gmail.com wrote: I thought QT model has 16 SATA/SAS ports on the backplane and only E1/E2 model supports JBOD mode. Sorry, my mistake. I have the E1 model, and thought that Roman had mentioned he had that as well. I see now in the first post that he in fact specified the no-expander version of the case. I'll try to read ore carefully in the future. That's ok, I thought myself that it has. The JBOD power control board will suffice to turn the box on, but you will need to do one of three things: 1) find an external SAS expander card like Chenbro makes (but good luck finding them :( ) 2) exchange your case for the SAS expander version 3) Get lots of SAS controllers and connect disks one to a port. Will Yes, I've already figured that out, available options are not very nice. Shit, this is a conspiracy - why nobody makes that freaking sas expanders? I know why - a card would cost 100$, a chassis JBOD costs 1500$ -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] convert common chassis to JBOD?
It may be cheaper and easier to just replace the backplane if the case is already bought. Is it an easy procedure? I doubt -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] convert common chassis to JBOD?
Chris Du wrote, On 23.12.-28158 14:59: You need JBOD kit. It's basically a power card and a SAS cascading cable. Hi Chris, Do you have any particular in mind? -- Roman ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] convert common chassis to JBOD?
Ok, I see. We've ordered that cards already. But anyway thanks. -- Roman Chris Du wrote, On 21.09.2009 17:25: Yours is different as mine. I have the E1 model which use SAS expender chip on backplane. However, the power card should be same. http://www.supermicro.com/products/chassis/3U/836/SC836E1-R800.cfm Look at optional parts in the bottom, there is JBOD kit that inclues power control card. On Mon, Sep 21, 2009 at 2:13 PM, Roman Naumenko ro...@frontline.ca wrote: Chris Du wrote, On 23.12.-28158 14:59: You need JBOD kit. It's basically a power card and a SAS cascading cable. Hi Chris, Do you have any particular in mind? -- Roman ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] convert common chassis to JBOD?
Yes, that's what I needed. Thank you very much! -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Making zfs storage HA clustered
From: Roman Naumenko ro...@frontline.ca Is there are any way to build HA storage using common components like JBOD enclosures, lsi hba, cheap sata drivers? Yes. http://www.sun.com/storage/disk_systems/unified_storag e/index.jsp Yes, that the right option when going cheap :) -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Making zfs storage HA clustered
[Added ha-clusters-discuss] Have you looked at what Open HA Cluster(OHAC) provides? http://opensolaris.org/os/community/ha-clusters/ There is a HA-ZFS agent for OHAC and more recently support for shared-nothing storage with COMSTAR. Augustus. I can't find this. Is there any documentation or implementation process described? Or it's just a development version? -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Making zfs storage HA clustered
Augustus, thank you for the information. Ok, so it is called Shared Nothing Storage to make storage server HA. And it should be build with MPxIO, as I've learned from your blog. Is there are any way to build HA storage using common components like JBOD enclosures, lsi hba, cheap sata drivers? -- Roman Naumenko ro...@frontline.ca -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Making zfs HA
I've heard about Nexenta. But we're just started to move away from another linux-build appliance, so I don't feel like start using another Linux once again. Hopefully, sometime we'll just order proper hardware from SUN to provide clustering or whatever is needed. Anyway, thanks for suggestions. Is it possible to use the code of plugins you've mentioned in Opensolaris? Anyway, it's still not a true clustering solution, but sounds promising. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] [HBA card] is needed
Hi guys, It's kind of emergency, a customer want a fast, expandable array but I don't have any HBA to connect jbods. I've decided to use lsi sas 3801e card. http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/lsisas3801e/index.html There is positive feedback for this card when used with JBOD enclosures. However, our sales dep can't find it fast enough (back order is everywhere). Can anybody advise something similar? I know many vendors use lsi chipset, but for example Dell cards have some firmware issues when installed on Solaris (or not?) Thanks in advance. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] [HBA card] is needed
I have two of these sat on my desk right now - order them on overnight delivery last week no problem. What country are you in? Regards, Tim Creswick Hi Tim, I'm in Canada. I wonder what was the price for this card? And seems like we found something, guys from pc-pitstop.com are promising to ship what we need promptly. -- Roman ro...@frontline.ca -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Making zfs storage HA clustered
I wonder what opensolaris guru could advise? -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] Making zfs HA
Hello list, What are the options for building clustering storage using opensolaris? I'm interested in HA solutions. I tried only one option - AVS replication. Unfortunately, AVS configuration is too complicated. Groups, bitmaps, queues, rcp timeouts, slicing - it's just a nightmare to make it work and support in production when there are more that a couple of pools. And probably it's gonna be slow if there are jbods attached to a storage controller, or will require half of Ethernet ports to replicated more or less reliably. 10Gige interface for it - did anybody try? Another option I'm looking into is sending snapshots. But regardless what we've heard from sun - it's gonna slow down zfs operations. Creating a snapshot is not quick on loaded pool, especially if storage controller manages many pools. So, it's probably dozens of minutes delays in transferring snapshots over to a standby server. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] jbos enclosures: is online expansion possible (without rebooting)?
On Tue, Sep 8, 2009 at 10:43 AM, Roman Naumenkoro...@frontline.ca wrote: Thanks, Ross. Just to clarify: connecting the second enclosure doesn't require the first to be turned off? My understanding is that expanding pool can be done completely without service interruption? You can hot-add storage enclosures that support hot-adding if the controllers support this as well, the MD1000s from Dell support this as do the LSI 1068 and 1078 based controllers. Excellent! Just another step to 7410 under 10 grand :) -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] [arc_summary] free memory is 2.6G of total 16G
I have a question about stats output from arc_summary. Is Free Memory supposed to be on this level? It's almost 16% of all memory. Maybe Max Size (Hard Limit) should be corrected, if 2.5G of memory remains free all time? ./arc_summary.pl System Memory: Physical RAM: 16369 MB Free Memory : 2593 MB LotsFree: 255 MB ZFS Tunables (/etc/system): ARC Size: Current Size: 9870 MB (arcsize) Target Size (Adaptive): 9870 MB (c) Min Size (Hard Limit):1918 MB (zfs_arc_min) Max Size (Hard Limit):15345 MB (zfs_arc_max) -- Roman Naumenko ro...@frontline.ca -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] [Comstar] listens on all interfaces, sends all targets to everybody
Can anybody advise? Bond target port to a particular interface, restrict targets appearing per initiator - is this something available in Comstar or I'm missing something? -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] [Comstar] listens on all interfaces, sends all targets to everybody
Sorry for repeated question, I remember somebody asked already, can't find when. First question: 1. Does a configured target with all hg,tg,tpg, initiators - everything, none the less makes comstar listen on all interfaces for incoming connections? netstat -an | grep 3260 *.3260 *.*0 0 262300 0 LISTEN Basically, I would like to restrict connections to LUN to a particular one. The same for an initiator - it should not see other targets. 2. Target configured along with tpg on interface e1000g0, but I can get list of targets by adding Target portal discovery as e1000g0 ip on initiator. Although it can't login, still confusing. Again, I'm getting the list of targets since it listens on all interfaces. Any references to the documentation explaining this are appreciated. -- Roman Naumenko ro...@bestroman.com -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Comstar: MS can't format iscsi drive.
Excellent analysis. Can you add commands for tshark that you've used? So, Opensolaris sends large packets that confuses WireShark (and probably Windows)? It's intel motherboard. $ uname -a SunOS zsan01 5.11 snv_118 i86pc i386 i86pc Solaris That's what I have on the server, don't know how to list NICs chipset /etc/drv/ Intel e1000g Gigabit Ethernet Adapter $dladm show-linkprop | grep mtu e1000g0 mtu rw 1500 1500 1500-9216 [b]e1000g1[/b] mtu rw 9216 1500 1500-9216 I/OAT is probably enabled, I remember there was something in BIOS. I'll try to disable it, but we've tested win2008 with Linux. It failed to format LUN on the old Linux storage server. I'll try to post that trace as well. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Comstar: MS can't format iscsi drive.
On Aug 27, 2009, at 1:25 PM, Roman Naumenko wrote: Excellent analysis. Can you add commands for tshark that you've used? So, Opensolaris sends large packets that confuses WireShark (and probably Windows)? It's intel motherboard. $ uname -a SunOS zsan01 5.11 snv_118 i86pc i386 i86pc Solaris That's what I have on the server, don't know how to list NICs chipset /etc/drv/ Intel e1000g Gigabit Ethernet Adapter try a scanpci -v, then to match with the driver that's binding check the vendor and device id's in /etc/driver_aliases Intel 82546EB Ethernet Device - from smbios This is from scanpci -v: pci bus 0x0005 cardnum 0x00 function 0x01: vendor 0x8086 device 0x1096 Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) CardVendor 0x8086 card 0x3484 (Intel Corporation, Card unknown) STATUS0x0010 COMMAND 0x0047 CLASS 0x02 0x00 0x00 REVISION 0x01 BIST 0x00 HEADER 0x80 LATENCY 0x00 CACHE 0x10 -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] Comstar: MS can't format iscsi drive.
It's Windows 2008 If I choose quick format it gives an error almost immediately: The format did not complete successfully. If it's the long format action, then it starts formtting, doing something for couple of hours and then fails with the same error. list-lu: LU Name: 600144F0A02504004A9311690002 Operational Status: Online Provider Name : sbd Alias : /dev/zvol/rdsk/zsan01store/mbx01-node2-test View Entry Count : 1 Data File : /dev/zvol/rdsk/zsan01store/mbx01-node2-test Meta File : not set Size : 536870912000 Block Size: 512 Vendor ID : SUN Product ID: COMSTAR Serial Num: not set Write Protect : Disabled Writeback Cache : Disabled itadm list-target -v TARGET NAME STATESESSIONS iqn.1986-03.com.sun:02:f6265007-17cd-62aa-c31c-d2676533ae89 online 0 alias: - auth: none (defaults) targetchapuser: - targetchapsecret: unset tpg-tags: tpg-0g1 = 2 zfs list zsan01store/mbx01-node2-test 500G 7.42T 844K - -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Comstar: MS can't format iscsi drive.
Of course by RDP and we've been doing it for a couple of years using linux iscsi target. There are too many servers to format them using console. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Comstar: MS can't format iscsi drive.
Windows 2008 doesn't like it. XP formats volume easily, it can be mounted then on 2008 over iscsi. The only difference is that its a 2-node cluster on 2008. -- Roman Naumenko -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Comstar: MS can't format iscsi drive.
Thank Nigel, Disabling jumbo and tso on win server allowed formatting to be done. SUN had unchanged config: e1000g1: flags=1001000843UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU mtu 9000 index 3 inet 10.10.110.101 netmask ff00 broadcast 10.10.110.255 ether 0:15:17:89:97:1 I'm attaching traces from both sides during iscsi session: logon to target, online volume in Windows and do quick format. For jumbo on mode I saw a warning while tracing on SUN: snoop -r -d e1000g1 -o jambo_on_tso_off.cap Using device e1000g1 (promiscuous mode) 42817 (warning) packet length greater than MTU in buffer offset 0: length=26960 42819 (warning) packet length greater than MTU in buffer offset 27048: length=18000 42828 (warning) packet length greater than MTU in buffer offset 16728: length=27680 42830 (warning) packet length greater than MTU in buffer offset 44496: length=13728 File to download: http://www.speedyshare.com/538479835.html Careful, unpacked traces are big -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Restricting initiator resource consumption
Tristan Ball wrote: I believe Zpool iostat will include cached IO's, and write IO's which will be coalesced into a single physical IO to your disk. p The plain iostat command is a good place to start to see what's actually going to disk. iostat -dxzcn 1 is what comes out my fingers automatically, although I vary the interval, and often add CM the options. p I believe that 'zpool iostat' also shows only what is going to physical disk, just like regular iostat. p If you want to judge the effect of caching, compare the stats from 'fsstat zfs' (or 'fsstat /mountpoint' to isolate a particular dataset) against the zpool iostat numbers. When I watch fsstat during heavy write activity, such as an incoming backup, I see a steady stream of writes with fsstat, but only periodic bursts of writes with zpool iostat or plain iostat. p Thanks, I didn't know how to look into fs access stat. p -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Restricting initiator resource consumption
If your pool has few vdevs (I believe you had a single raid-z), then this will be slow, as you will probably get only about 100 IOP/s from your pool. My exchange server happily produces 1500 IOP/s in daily use, and would go higher but our current array wont go any faster for un-cached IO. :-)p How do you check iops?/br I saw writes 1000 in zpool iostat when I tested pool with sqlio over iscsi./br But it was async, of course./br I didn't find a way to make sync synthetic writes on windows, so I'm waiting fr exchange to see what kind of writes it makes./br p Even with a dedicated SAN and 15K/rpm drives, MS generally recommend Raid-10 configurations for exchange. Raid-5/6, or RaidZ1/2 usually doesn't give the IOP/s rates you need - although many people do anyway.br p That's why I'd like to use SSD - to improve iops to a desirable level. p Only if you get a reasonable cache hit rate. I suspect that a combination of exchange and sql server is likely to result in a fairly poor hit rate - certainly not enough to alleviate the performance bottleneck of a single vdev. p I meant slog - to improve iops for wites. Reading cache is not a big deal right now. --p Roman Naumenko/br ro...@frontline.ca Message was edited by: rokka -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Restricting initiator resource consumption
div id=jive-html-wrapper-div ZFS based mirroring is only slightly less reliable than Raid-z2, and it gives much better IOP/s.br How it can give much better iops taking into account slow and large sata drives? Neither exchange or SQL Server tend to be throughput bound, but both require very high IOP/s rates. I don't think you'll see enough traffic accross the controller for it to be the bottleneck!br Correct, there is not much traffic, but we have decreased iops for a particular lun affected by activity on another volume (backup going, defragmentation, on-line maintenance or something else, I'm not a specialist here). Even with a dedicated SAN and 15K/rpm drives, MS generally recommend Raid-10 configurations for exchange. Raid-5/6, or RaidZ1/2 usually doesn't give the IOP/s rates you need - although many people do anyway.br That's why I'd like to use SSD - to improve iops to a desirable level. How many users are there on your exchange server? I have a suspicion that even if you move to zfs mirroring, you still might not get enough performance for exchange with that number of drives - especially if you're putting other load on the system.br Yes, there is other load and this is a problem as well (archiving in our case). That's why they want to separate exchange dbs from each other. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Restricting initiator resource consumption
Using mirrors just makes zfs useless. The whole idea is a reliable raid6 storage with snapshots features. The question is how prevent saturation for one volume. By the way, how this is designed in a large disk arrays, where are hundreds luns are accessed simultaniosly? -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Windows over iscsi accesses snv_101b: freezes completely
Seems like the issue was indeed with TSO and bad windows driver. They updated it, disabled TSO - no complaints any more. Thank you for helping! -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Slog: how to make it work?
This is comstar+itadm configuration, so iscsitgt is not used. There is no driver for storage, it's JBOD on Adaptec raid controller. And bottom of the problem is: nfs makes async writes always (copying files, bonnie++, any other writing activity), I see it right away on iostat for ssd device. Windows initiator doesn't produce any of such writes - on synthetic tests I've tried and with yours sqlio. We'll put an exchange db on that volume and see the difference, but for me it's quite strange situation. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Restricting initiator resource consumption
Using mirrors just makes zfs useless. The whole idea is a reliable raid6 storage with snapshots features.br The question is how prevent saturation for one volume./blockquotediv brWhat is your current zpool format (raidz, raidz2, etc)? Using a mirror does not make zfs useless - you can still use all of the built-in features of the software. Mirroring your drives just makes it a raid1 instead of raid6.br raidz2 array of 8 disks, 2Xquade core, 16G Yes, it's possible to configure it as 4x2 mirrors with capacity almost 2 time less, with reliability also degraded. But since they on the same controller, performance of one pool might be dependent on access to another. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Restricting initiator resource consumption
On Aug 18, 2009, at 10:47 AM, Roman Naumenko no- re...@opensolaris.org wrote: Using mirrors just makes zfs useless. The whole idea is a reliable raid6 storage with snapshots features.br The question is how prevent saturation for one volume./blockquotediv brWhat is your current zpool format (raidz, raidz2, etc)? Using a mirror does not make zfs useless - you can still use all of the built-in features of the software. Mirroring your drives just makes it a raid1 instead of raid6.br raidz2 array of 8 disks, 2Xquade core, 16G Yes, it's possible to configure it as 4x2 mirrors with capacity almost 2 time less, with reliability also degraded. But since they on the same controller, performance of one pool might be dependent on access to another. Roman, That config will not handle exchange db well as it will have the max IOPS of a single disk because raidz/raidz2 has to write the whole stripe width in each write. I would seriously re-think your configuration or go with a hardware RAID solution. I would like to have any major SAN in place for exchange: just put it there and forget, instead of dancing with zfs. Unfortunately, not within the current budget. But since zfs has more features, better reliability than typical vendor's NAS/SAN appliance and it free, we are going with it. Well, performance, yes. I'm personally just waiting for the slog bug to be fixed IBM already did a great job bringing 250$ SSD on the market. -- Roman -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Getting to the bottom of poor ZFS read performance
I'd like to confirm that irq overlapping might be the issue, at least it was a case for FreeBSD kernel. We had a bad performance once on a freebsd based firewall. And it turned out that embedded broadcom NICs and NICs on PCI shared the same irq. After reassigning them, processor load decreased almost 2 times with much better throughput. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] Slog: how to make it work?
I have a strange filing that attached slog device doesn't do anything on zpool. zsan0store 203G 10.7T 47 23 5.85M 2.49M raidz2 203G 10.7T 47 22 5.85M 2.44M c7t0d0 - - 25 5 830K 418K c7t1d0 - - 25 5 830K 418K c7t2d0 - - 25 5 829K 418K c7t3d0 - - 25 5 829K 418K c7t4d0 - - 25 5 830K 418K c7t5d0 - - 25 5 830K 418K c7t6d0 - - 25 5 830K 418K c7t7d0 - - 25 5 830K 418K ... c9d0 128K 29.7G 0 1 2 229K ... c9d0 128K 29.7G 0 0 0 79.1K ... c9d0 128K 29.7G 0 0 0 0 format -e . c9d0 FiD 2.5-90429AAB-0001-29.84GB /p...@0,0/pci-...@1f,2/i...@0/c...@0,0 As a back-end storage I use raw volume created on a raidz2 pool with Comstar iscsi. Initiator is a standard windows iscsi. Speed and latency is tolerable on 118b, especially with compression=on and 128k bsize on zfs. The question is about zfs slog device and how to check if it can improve access. Does zfs cache raw volume data on slog device or only on a filesystem? What kind of tests I can run to see it works? -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Slog: how to make it work?
Ross Walker wrote, On 42-12-23 02:59 PM: On Aug 17, 2009, at 12:24 PM, Roman Naumenko no- re...@opensolaris.org wrote: The question is about zfs slog device and how to check if it can improve access. Does zfs cache raw volume data on slog device or only on a filesystem? What kind of tests I can run to see it works? The slog will cache all synchronous writes on the zpool whether they be zfs or zvol writes. While running a synchronous write look at the output of 'iostat -x 1' locate the 'sd' representing the slog device and look at it's io. -Ross I believe windows doesn't do sync since zpool iostat shows only 30-sec bursts and iostat for sd is zero all the time. Any thoughts how to make it dong sync writes? Will slog be helpful for typical windows appliance such as Exchange? -- Roman ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] Restricting initiator resource consumption
I have complaints from potential zfs storage users who blames SUN about not being capable to manage it's resources (LUNs). Let me explain what they mean by citing 3 examples: I started formatted the second drive and it killed all performance. Exchange is doing {online defragmentation, segmentation, } - other volumes suffer SQL server is doing something on db volume - other volumes becomes slow. What what you advise me? Is there is a point in such requirement? Is Fishwork, for example, capable to provide such management? From my point of view only on the network level it can be archived for iscsi protocol. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Slog: how to make it work?
Ok, thanks, I've seen original before... The problem know is that windows doesn't create at some point any sync activity. NFS, on the other hand create huge. NFS performace is just so poor, 15MB/s - and this is with ssd drive -- Roman . -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] itadm shows active session, win initiator doesn't find new disk
Resplved. thanks. view wasn't correctly defined. -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] itadm shows active session, win initiator doesn't find new disk
Hi Jim, Sure, I configured lu and added a view as described in the link. stmfadm list-lu -v LU Name: 600144F0CC12CC004A8600DA0001 Operational Status: Online Provider Name : sbd Alias : lu-vol1 View Entry Count : 1 Data File : /dev/zvol/rdsk/zsan0store/zsan0vol0 Meta File : not set Size : 1099511627776 Block Size: 512 Vendor ID : SUN Product ID: COMSTAR Serial Num: not set Write Protect : Disabled Writeback Cache : Enabled What is the Block Size: 512 ? This is not a zfs block size definitely which I set 128k when created volume. #stmfadm list-target -v Target: iqn.1986-03.com.sun:02:vol1 Operational Status: Online Provider Name : iscsit Alias : - Sessions : 1 Initiator: iqn.1991-05.com.microsoft:bla-bla-bla Alias: - Logged in since: Fri Aug 14 21:12:46 2009 ~# stmfadm list-hg -v Host Group: zsan0server-hg # stmfadm list-tg -v Target Group: win1-tg Member: iqn.1991-05.com.microsoft:bla-bla-bla Probably I should configure view allow all and then add restrictions. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Windows over iscsi accesses snv_101b: freezes completely
Thanks, guys, for looking into it. You did excellent diagnostic without looking on actual server (just like Dr. House does with it patients :) It was indeed tso option which causes 64k packets on traces, after disabling it - large packets are no any more there. Fanny, it doesn't appear if enable tso again - probably win server wants restart. We are now looking into win nic drives - if it's updated and if there are any bugs related to it. I'll keep posting results. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] itadm shows active session, win initiator doesn't find new disk
# itadm list-target TARGET NAME STATESESSIONS iqn.1986-03.com.sun:02:vol1 online 1 Windows initiator has this target connected, no issues. But a new disk doesn't appear in management console. What should I check? -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Windows over iscsi accesses snv_101b: freezes completely
Thanks for looking into this. You are right noticing unusual jambo frames on win nic. Why it's generated with setting disabling jambo - nobody I believe will explain. Freaking M$... -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Windows over iscsi accesses snv_101b: freezes completely
I'll post later capture file from solaris. There were no jumbo. And I checked another win server - it generates huge amount of 50-60k packets (targets are on 111b). Maybe this is a feature? Can somebody take a look on own win box and check jumbo presence (while it's not enabled on nic)? -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] svn118: lib/svc/method/dcoeconfig not found
Thank you, Jonathan. -- Roman Jonathan Edwards wrote, On 09-07-23 04:54 PM: http://opensolaris.org/jive/thread.jspa?threadID=107881 On Jul 23, 2009, at 4:40 PM, Roman Naumenko wrote: I upgraded box to 118 and now it's messed up. System goes to maintenance due to: system/devices/fc-fabric:default is in maintenance In logs: lib/svc/method/fcoeconfig not found 46 dependent services are not running. How did this fcoe manag to mess all things up? I even didn't plan to use fc at this moment. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] svn118: lib/svc/method/dcoeconfig not found
I upgraded box to 118 and now it's messed up. System goes to maintenance due to: system/devices/fc-fabric:default is in maintenance In logs: lib/svc/method/fcoeconfig not found 46 dependent services are not running. How did this fcoe manag to mess all things up? I even didn't plan to use fc at this moment. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] R: Re: Limiting iSCSI logon for targets
Title: Re: [storage-discuss] Limiting iSCSI logon for targets Thanks, but we've already used Openfiler. This time I'd like to build storage now without Linux involving :) Roman Anzuoni Guido wrote, On 09-07-08 04:25 AM: AFAIK, latest release of NexentaStor is built on top of COMSTAR. Guido Da: Roman Naumenko [mailto:ro...@frontline.ca] Inviato: marted 7 luglio 2009 19.06 A: Jim Dunham Cc: Anzuoni Guido; storage-discuss@opensolaris.org Oggetto: Re: Re: [storage-discuss] Limiting iSCSI logon for targets Jim Dunham wrote, On 42-12-23 02:59 PM: Guido, I am experimenting COMSTAR as iSCSI storage solution and I haven't found a way to limit the number of sessions for a specific iSCSI target. What I am trying to achieve is a configuration where only one initiator at time can logon to a specific target. While an initiator is connected others logon should be denied. Is it a missing feature ? iSCSI in COMSTAR is a block based protocol (not filesystem or database). At this level of LUN access, the means to control single use of a LUN is through Persistent Reserve, a pair of SCSI commands. 5E PERSISTENT RESERVE IN 5F PERSISTENT RESERVE OUT Access to these SCSI commands are typically not for the end-user, and are usually implemented by a volume manager or clustering software. There is a opensource utility called sg_persist that provides access to this: http://opensolaris.org/jive/thread.jspa?threadID=88835 Is there a way to do it that I haven't seen ? Specifics regarding your iSCSI Initiator, the type of operating system, filesystem or database type, may allow other options to be considered. For example, If the filesystem type on the iSCSI LUN is ZFS, the command "zpool import " provides a warning that another node "may" be accessing the ZFS filesystem on the iSCSI LUN. Unfortunately if the other node looses connectivity to the iSCSI LUN, or does not "zpool export ..." the filesystem, the warning will still be present. Also there is a "force" option to "zpool import ...", an option that does no assure exclusive access. - Jim By the way, what's the situation with web-interfaces for storage appliances? On the top there is a fishworks appliance, but it comes only with hardware as far as I know. There is a simple interface for zfs - smcwebserver Is there anything for comstar available? -- Roman Naumenko ro...@frontline.ca ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Sol 10u7 iscsitgt write performance
Ross Walker wrote, On 42-12-23 02:59 PM: [storage-discuss] Sol 10u7 iscsitgt write performance Ok, I have just about given up on this one, and I was so hopeful after having figured out the read performance issue (which to recap was a mix of ESX guest network latency and a randomizing of the data file from running random write and sequential read tests back to back). Now it seems I am up against the hardware and ZIL on this one. To make it short, 4k sequential writes, throughput 16MB/s... Local tests show hardware is capable of at least 132MB/s (3 raidz groups of 4 disks each, those are 15K SAS disks). I believe the network is tuned optimally, Nagle on Windows disabled. The ZIL is going to a 16GB Mtron SSD, 100us sequential access time. Any more suggestions or pointers to improve things are warmly welcomed. -Ross Hi Ross, Can you give me more details about your ssd? I'm looking for SSD to test it. Is yours expensive? Fast? Did you check it with dd command and iostat? Thanks, Roman ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] JBOD mode on Adaptec controller: disks are not recognized
Hello, It's Opensolaris 111b on x86. I'm trying to configure Adaptec 5085 to work in JBOD mode. It has 8 SATA disk connected. If disks are configured in array mode (each disk has 1 array) - Solaris recognizes it on the fly. r...@zsan0:~# cfgadm -lav Ap_Id Receptacle Occupant Condition Information When Type Busy Phys_Id c7 connectedconfigured unknown unavailable scsi-bus n /devices/p...@0,0/pci8086,2...@2/pci8086,3...@0/pci8086,3...@0/pci9005,2...@0:scsi c7::dsk/c7t0d0 connectedconfigured unknown Adaptec RAID 5805 unavailable disk n /devices/p...@0,0/pci8086,2...@2/pci8086,3...@0/pci8086,3...@0/pci9005,2...@0:scsi::dsk/c7t0d0 c7::dsk/c7t1d0 connectedconfigured unknown Adaptec RAID 5805 unavailable disk n /devices/p...@0,0/pci8086,2...@2/pci8086,3...@0/pci8086,3...@0/pci9005,2...@0:scsi::dsk/c7t1d0 And so on, altogether 8 disks. After configuring JBOD with 8 disk Solaris can see only controller: sad...@zsan2:~$ cfgadm -lav Ap_Id Receptacle Occupant Condition Information When Type Busy Phys_Id c7 connectedunconfigured unknown unavailable scsi-bus n /devices/p...@0,0/pci8086,2...@2/pci8086,3...@0/pci8086,3...@0/pci9005,2...@0:scsi r...@zsan2:~# cfgadm -c configure c7 cfgadm: Hardware specific failure: failed to get state for SCSI bus: No such device or address Can it be configured to work with JBOD? Thanks, Roman ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Limiting iSCSI logon for targets
Jim Dunham wrote, On 42-12-23 02:59 PM: Re: [storage-discuss] Limiting iSCSI logon for targets Guido, I am experimenting COMSTAR as iSCSI storage solution and I haven't found a way to limit the number of sessions for a specific iSCSI target. What I am trying to achieve is a configuration where only one initiator at time can logon to a specific target. While an initiator is connected others logon should be denied. Is it a missing feature ? iSCSI in COMSTAR is a block based protocol (not filesystem or database). At this level of LUN access, the means to control single use of a LUN is through Persistent Reserve, a pair of SCSI commands. 5E PERSISTENT RESERVE IN 5F PERSISTENT RESERVE OUT Access to these SCSI commands are typically not for the end-user, and are usually implemented by a volume manager or clustering software. There is a opensource utility called sg_persist that provides access to this: http://opensolaris.org/jive/thread.jspa?threadID=88835 Is there a way to do it that I haven't seen ? Specifics regarding your iSCSI Initiator, the type of operating system, filesystem or database type, may allow other options to be considered. For example, If the filesystem type on the iSCSI LUN is ZFS, the command "zpool import " provides a warning that another node "may" be accessing the ZFS filesystem on the iSCSI LUN. Unfortunately if the other node looses connectivity to the iSCSI LUN, or does not "zpool export ..." the filesystem, the warning will still be present. Also there is a "force" option to "zpool import ...", an option that does no assure exclusive access. - Jim By the way, what's the situation with web-interfaces for storage appliances? On the top there is a fishworks appliance, but it comes only with hardware as far as I know. There is a simple interface for zfs - smcwebserver Is there anything for comstar available? -- Roman Naumenko ro...@frontline.ca ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Building home-made 7210
Chris Du wrote, On 42-12-23 02:59 PM: Here is filebench I did with and without SSD, compression on and off again 2009.06 Hope it helps. BTW. You may want to connect your disks to onboard SATA port and see if it helps. My experience shows ZFS doesn't like HW raid controllers with onboard cache. Hi Chris, Interesting results. Seems like compression can help as much as SSD. Although both are helpless on small random writes. I'm afraid that what we need most with our exchange servers which are kind of M$ DB designed. Do you have specification for hardware you used for tests? May I ask you what kind of storage you use eventually for your production (if anything with zfs)? -- Best regards, Roman Naumenko Network Administrator ro...@frontline.ca ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Building home-made 7210
Chris Du wrote, On 42-12-23 02:59 PM: Compression helps when you don't export volumes through NFS. SSD really helps increase write speed and reduce latency for NFS. This filebench was run on storage server itself. I haven't got a chance to run it inside client. Inside client, I think the result will be more positive for small random IO, if you check iSCSI lun properties, it shows writeback enabled by default so storage server memory is used for writeback cache. I'm still testing the environment, the server is dual Opteron 254, 8G memory, dual broadcom 5704 nic, qlogic 2460. Disks are 3*MAXTOR AtLAS 10K5, but later I'll make 2*147G for OS mirror, 6*ATLAS 10K5 in RAID10, plus a Supermicro storage shelf with 24 SAS disks, 2*Intel x25-e SSD. This environment is for my VMware ESX test lab. Actually, I see very good performance inside VMware guest for small random I/O. With, or without compression there is something strange with NFS on 111b, I give up. Will try to test it on 117, but the performance was so bad with no obvious reasons (like James describing above) that now it's hard even consider ZFS+NFS for production. But iscsi or FC still can work out for me. I'd like to ask you couple more questions: what is the chassis model going t get? And how the storage will be connected to your vmware server? Do you use any tuning like jumbo frames, ack registry key correction on windows? I also see you have Qlogic FC card, how does it perform? Did you have a chance to test comstar with it? -- Regards, Roman ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] stmf doesn't start: Unable to open device. Is the driver attached?
Thanks, it's resolved. Clearing the state didn't help, but reboot did. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] Building home-made 7210
Hi, Just wonder if there is something special that makes 7120 faster than the server you can assemble by yourself from common components. I have an ultimate task to make a fast zfs :) And what I'm thinking about is that there is only thing you can't get from nearest server hardware supplier: SDD I see that STEC SSD ZEUSiops are used in 7120. Other staff is pretty common: like 7200 SATA disks, controller probably can't increase speed much since it's a standard SATA 300MB/s Processor and memory are usual, I have for example 2G E5405 quad-core processors and 16Gb RAM and 8 1.5T disks on Adapted 5805. Version is 111b Well, the performance of such system is so-so. 30Mb/s nfs, 50Mb/s iscsi. I'm looking for a way to improve it radically. I have in mind FC, SAS drives and SSD, going to try all of them. I'd like to start with SSD but STEC web-site doesn't show prices. Probably they are too high. Are there any other SSD supplier to try? IBM has 500$ 18G drives www-03.ibm.com/systems/storage/disk/ssd/index.html but probably they don't optimized for writing. IBM says that they can make write rate : 47MB/s and random Read (4K blocks): 5000 IOPS I also see that SUN itself sell 32 GB SATA SSD, 1200$ I see the following info in the catalog: 150Mb/s writing, 5000-7000 random write IOPS and 300 μs max command latency. Is this one going to improve zfs performance? Thanks, Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] stmf doesn't start: Unable to open device. Is the driver attached?
Hi Brent, Thanks for this information, I'll use it. Actually, what developers think about it's documentation? There are basically no documentation for COMSTAR available (I don't take into account messy wiki pages, man pages are helpful, but they are not a real documentation either). Are there plans to create descent one? -- Roman ro...@frontline.ca -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] Building home-made 7210
Great, I'll look into your table. Regarding raid controller: I disabled cache during creation arrays (for some reason JBOD mode is not available for opensolaris on Adaptec 5085) Actually, it writes very fast, so I can't blame controller: r...@zsan0:/# dd if=/dev/zero of=/zsan0store/bonnie/test.txt bs=64k count=30 r...@zsan0:/# zpool iostat 30 capacity operationsbandwidth pool used avail read write read write zsan0store 718G 10.2T 36 1.54K 166K 176M And iostat shows up to 280Mb/s on controller during dd. extended device statistics devicer/sw/s Mr/s Mw/s wait actv svc_t %w %b c7 25.0 2695.41.6 [b] 278.3 [/b] 0.0 217.6 80.0 0 766 Message was edited by: rokka -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] 111b: NFS: zfs writes synchronously only
Hello, I'm trying to configure nfs on zfs. No special tuning, just set sharenfs=on and on the client simple mount: 172.0.7.100:/zsan3store/mailarch2/opt/mailarch2 nfs async,noatime dd from local machine gives ~40Mb/s Not impressive at all. Tweaking with nfs client options (like setting block sizes) only decreases speed. And writing small files (mail archive) makes zfs write to the storage constantly, no bursts at all. Is it how it should be? Nfs client is Ubuntu 8.04 Regards, Roman Naumenko ro...@frontline.ca Message was edited by: rokka -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] 111b: NFS: zfs writes synchronously only
Thanks, that's great article. However, I don't have any ssd to check if it's the case. What I see that there is no syncro requests. trace -n 'nfsv3:::op-write-start { @[args[2]-stable] = count(); }' dtrace: description 'nfsv3:::op-write-start ' matched 1 probe ^C 0 64 I just wonder why zfs still writes permanently? Regards, Roman Naumenko ro...@frontline.ca -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] stmf doesn't start: Unable to open device. Is the driver attached?
How does the service start? Where is the svc-stmf config that is requested? r...@zsan3:~# sbdadm Unable to open device. Is the driver attached ? r...@zsan3:~# svcadm enable stmf r...@zsan3:~# stmfadm list-state stmfadm: unknown error r...@zsan3:~# svc -xv svcadm svccfg svcprop svcs r...@zsan3:~# svcs -xv svc:/system/stmf:default (STMF) State: maintenance since July 1, 2009 1:11:35 PM EDT Reason: Start method failed repeatedly, last exited with status 1. See: http://sun.com/msg/SMF-8000-KS See: man -M /usr/share/man -s 7D stmf See: man -M /usr/share /man -s 1M stmfadm See: /var/svc/log/system-stmf:default.log Impact: This service is not running. r...@zsan3:~# tail -f /var/svc/log/system-stmf:default.log [ Jul 1 13:11:35 Enabled. ] [ Jul 1 13:11:35 Executing start method (/lib/svc/method/svc-stmf start). ] svc-stmf: unable to load config Thanks, Roman Naumenko ro...@frontline.ca -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] comstar status
Hello Dan, Can you point me where to download iso image for development version? I have on hdd b11x.iso but can't find the source for it. Thanks, Roman Naumenko -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] comstar status
Hi Dan, Thanks for the information. Can you advise what version of opensolaris is better for production use? Is this only 0906? Are there any features and fixes that are not available in 0906 but are presented in newest? Especially regarding iscsi. -- Roman Naumenko Network Administrator ro...@frontline.ca mailto:ro...@frontline.ca Dan Maslowski wrote, On 12/23/-28158 02:59 PM: Re: [storage-discuss] comstar status Roman, You should use COMSTAR iscsi. Development has stopped on iscsitgtd, COMSTAR is the future You can trust my answer, I own both varieties... Regards, Dan Dan Maslowski Sr. Engineering Manager COMSTAR Storage Software Roman Naumenko wrote: I have a question about COMSTAR and iscsi service. I'm not sure what to use for iscsi: comstar or iscsi package. There has been quite a lot of comstar references for iscsi functionality. If iscsi functionality is transferring to COMSTAR and if original packages will be abandoned? Should I switch to comstar for providing iscsi targets? Thank you, Roman Naumenko *Network Administrator *ro...@frontline.ca mailto:ro...@frontline.ca ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] comstar status
I have a question about COMSTAR and iscsi service. I'm not sure what to use for iscsi: comstar or iscsi package. There has been quite a lot of comstar references for iscsi functionality. If iscsi functionality is transferring to COMSTAR and if original packages will be abandoned? Should I switch to comstar for providing iscsi targets? Thank you, Roman Naumenko Network Administrator ro...@frontline.ca ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] async_throttle_delay in progress: what exactly it shows?
Or is this a total delay for the whole period of replication time? This is the total number of times SNDR had to delay replicating a chunk of data, since a given replica was last enabled or resumed by SNDR. An increment occurs during asynchronous replication with both memory or disk queues, at the time when the total number of items, or the total size of all items, exceeds what was previously configured as the memory or disk queue size. For memory queues this is: sndradm [opts] -F [set] set maximum fbas to queue sndradm [opts] -W [set] set maximum writes to queue For disk queues this is: The summation of the number of items, and the number of blocks, based on the physical size of the associated disk queue. It is impossible to keep this number low, by increasing the memory queue, disk queue, the number of asynchronous flusher threads (sndradm -A ...), higher network bandwidth, lower network latency, a faster SNDR secondary node, or some combination of these. - Jim Thanks for explanation, Jim. Now I see what it means. -- Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] FeedBack on SSD zfs raid perf
Nobody wants to test since they don't live for a long time :) I've asked around and people tell that ssd are fast but die fast as well under the load. -- Roman Naumenko -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] async_throttle_delay in progress: what exactly it shows?
Can somebody look into this magic number and explain why async_throttle_delay slowly grows over the time and if might be related to the delays? The number is not all that magic. Have you looking at the following manual, and then specificlly at for the keyword async_throttle_delay? http://docs.sun.com/source/819-6148-10/chap5.html, search for async_throttle_delay I meant it's interesting why it's growing slowly over the time? I thought this is an average value for given perion of observation. Or is this a total delay for the whole period of replication time? -- Regards, Roman Naumenko -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] async_throttle_delay in progress: what exactly it shows?
Hello, We've set up a local replication between two zsan storages over direct 1 Gb connection /dev/rdsk/c3t0d0s0 - zsanback.local:/dev/rdsk/c3t0d0s0 {6 more} /dev/rdsk/c3t7d0s0 - zsanback.local:/dev/rdsk/c3t7d0s0 There is a zfs filesystem being accessed with iscsi. Recently users started to complain about delays in accessing files. Can somebody look into this magic number and explain why async_throttle_delay slowly grows over the time and if might be related to the delays? #kstat sndr:1:setinfo 15 | grep async_throttle_delay async_throttle_delay 17232313 async_throttle_delay 17235445 async_throttle_delay 17235445 async_throttle_delay 17240204 async_throttle_delay 17240204 async_throttle_delay 17245600 async_throttle_delay 17245600 async_throttle_delay 17251474 async_throttle_delay 17251474 async_throttle_delay 17257441 # kstat sndr:1:setinfo module: sndr instance: 1 name: setinfo class: storedge async_block_hwm 21069 async_item_hwm 2439 async_queue_blocks 17982 async_queue_items 215 async_queue_type memory async_throttle_delay 17271404 autosync 0 bitmap /dev/md/rdsk/bmp1 bitsset 332 bmpflags 0 bmp_size 5713920 crtime 1301557.77839719 disk_status 0 flags 6150 if_down 0 if_rpc_version 7 maxqfbas 16384 maxqitems 4096 primary_host mainzsan.local primary_vol /dev/rdsk/c3t1d0s0 secondary_host zsanback.local secondary_vol /dev/rdsk/c3t1d0s0 snaptime 2247328.51220685 syncflags 0 syncpos 2925489887 type_flag 5 volsize 2925489887 About this values: maxqfbas 16384 maxqitems 4096 If I setup them with higher values then I will see increase in async_block_hwm and async_item_hwm respectively. Does it make sense to change them with 1G local connection? Typical load is about 5Mb/s reading/writing and sometimes it goes up to 40Mb/s Right now I can't see relationship between zpool I/O load spikes and access delays. -- Best regards, Roman Naumenko Network Administrator ro...@frontline.ca ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] motherboard for storage server
I'm looking for a motherboard: there is a good one, Intel S5000VSA. We've been using it and except for updating firmware for 4core cpu there were no issues. It supports 2 CPU and 16G memory. Does it make sense to go with motherboards that support 32G of memory? Can it improves zfs performance to a significant degree? -- Best regards, Roman Naumenko ro...@frontline.ca ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] motherboard for storage server
Title: Re: [storage-discuss] motherboard for storage server Eric D. Mudama wrote, On 06/02/2009 02:22 PM: On Tue, Jun 2 at 12:46, Bob Friesenhahn wrote: On Tue, 2 Jun 2009, Roman Naumenko wrote: Does it make sense to go with motherboards that support 32G of memory? Can it improves zfs performance to a significant degree? The improvement depends on how heavily the server is accessed and the total amount of data which is accessed frequently. With sufficient RAM, the disks will only be accessed for writes and writes will also be faster if the data being updated is cached. If you need performance and can afford it, then go for the 32GB of RAM. The RAM will be the best performer, but for the cost and power, it might be a better choice to stick with 8-16GB RAM and add an SSD or two as a cache device. Each 4GB FB-DIMM for that motherboard will run you about $100, so a 32GB cache SSD has about the same cost up front as 16GB of RAM. Each FB-DIMM burns about 10W of power, compared to 2.5W for the entire SSD when active, and virtually zero when not being accessed. Well, SDD is interesting thing. Of course plus-minus 10W or even hundreds W doesn't make any difference for us, we are not so environmentally friendly here, in Canada as they try to persuade everybody :) But that the different story. Regarding SSD. You pointed interesting. And it's particularly interesting since AVS bitmaps are candidates to be placed on ssd as Jim suggested recently. Well, if I want to try SSD, where should I look? Particular manufactures? Special motherboards? Can you recommend something? Unfortunately never had experience. And making fast storage appliances is essential (since reliability is excellent on zfs). A single cache SSD should be able to easily saturate gigabit ethernet even in random workloads if that is your performance bottleneck. Well, it's a very good speed but I'm afraid with zfs+iscsi it won't be so fast. -- Best regards, Roman Naumenko Network Administrator ro...@frontline.ca 25 Adelaide Street East | Suite 600 | Toronto, Ontario | M5C 3A1 Helpdesk: (416) 637-3132 www.frontline.ca ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] sndradm - add volume (zfs spare) to existent group
Ok, thanks. That's what I was thinking about. I just wan't sure if it's ok to add volumes into existent groups from AVS point of view Only one question about commands you suggested. Is this about adding a disk queue? sndradm -g -q a : Is this kind a compulsory? I mean I don't use disk queues for this particular installation. And let me ask you Jim about the famous post AVS and ZFS, seamless. It was yours, right? Even if not, would you mind commenting it? The first question is about the issue with RDC timeouts and disks queues that I asked some times ago. http://www.mail-archive.com/storage-discuss@opensolaris.org/msg05497.html I wonder if that configuration described in blog was successful? How did the link look like? Was it something like dedicated circuit with usual latency? How did it work out eventually? And the second question about slices for bitmap. What was the reason for putting it not on dedicated pair of disk as documentation suggests? (avs administration guide: raw devices must be stored on a disk separate from the disk that contains the data from the replicated volumes The bitmap must not be stored on the same disk as replicated volumes) Roman Naumenko Frontline Technologies -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
Re: [storage-discuss] AVS - SNDR: Recovery bitmaps not allocated
Thanks for your help! Actually I have some more questions. I need to make a decision on replication mode for our storages: zfs send-receive, avs or even microsoft internal tool on the iscsi volumes with independent zfs snashots on both side. Initially avs seemed to me a good options, but I can't make it working on 100Mb link with 8x1.36Tb volumes Roman, A weird issue: 1. avs works for connections on a local switch via local freebsd router connected to the switch host1 - switch - router freebsd - switch - host2 2. When trying to emulate replication using far distance remote connection with the freebsd router on the remote side then AVS fails with the error: [b]sndradm: warning: SNDR: Recovery bitmaps not allocated [/b] First of all, what version of AVS / OpenSolaris are you running? The reason I ask, is that this error message being returned from sndradm, was a problem partially resolved for AVS on Solaris 10, or AVS bundled with OpenSolaris. # sndradm -v Remote Mirror version 11.11 # uname -a SunOS tor.flt 5.11 snv_101b i86pc i386 i86pc Solaris The specific issue at hand, is that during the first stages of an sndradm -u ..., update command, the SNDR secondary node is asked to send its entire bitmap to the SNDR primary node. The operation is done via a Solaris RPC call, an operation which has an associated timeout value. If the amount of time it takes to send this data over the network from the secondary node to primary node, exceeds the RPC timeout value, the operation fails with a Recovery bitmaps not allocated. It's strange that sndr sends entire bitmap - what if one is for a big replicated volumes like for 1.36Gb? It's more that 10 blocks for async replication, . There will be constant timeouts on average 100M link in this case. SNDR does not replicate the bitmap volume, just the bitmap itself. There is one bit per 32KB of primary volume size, with 8 bits per byte, and 512 bytes per block. The answer for 1.36GB is just 11.04 blocks, or 5.5KB. But the dsbitmap shows 100441 blocks for async replication, I'm missing something? Required bitmap volume size: Sync replication: 11161 blocks Async replication with memory queue: 11161 blocks Async replication with disk queue: 100441 blocks Async replication with disk queue and 32 bit refcount: 368281 blocks Good. I kind of figure that this was the problem. What are is your SNDR primary volume size? After initial sync started to work (although it's a very slow process and it takes 10-15 mins to complete) I have the following situation: 1. Storage (8x1.36Tb in one raidz2 pool): r...@tor.flt# sndradm -i tor2.flt2 /dev/rdsk/c3t0d0s0 /dev/md/rdsk/bmp0 mtl2.flt2 /dev/rdsk/c3t0d0s0 /dev/md/rdsk/bmp0 ip async g zfs-pool tor2.flt2 /dev/rdsk/c3t1d0s0 /dev/md/rdsk/bmp1 mtl2.flt2 /dev/rdsk/c3t1d0s0 /dev/md/rdsk/bmp1 ip async g zfs-pool tor2.flt2 /dev/rdsk/c3t2d0s0 /dev/md/rdsk/bmp2 mtl2.flt2 /dev/rdsk/c3t2d0s0 /dev/md/rdsk/bmp2 ip async g zfs-pool tor2.flt2 /dev/rdsk/c3t3d0s0 /dev/md/rdsk/bmp3 mtl2.flt2 /dev/rdsk/c3t3d0s0 /dev/md/rdsk/bmp3 ip async g zfs-pool tor2.flt2 /dev/rdsk/c3t4d0s0 /dev/md/rdsk/bmp4 mtl2.flt2 /dev/rdsk/c3t4d0s0 /dev/md/rdsk/bmp4 ip async g zfs-pool tor2.flt2 /dev/rdsk/c3t5d0s0 /dev/md/rdsk/bmp5 mtl2.flt2 /dev/rdsk/c3t5d0s0 /dev/md/rdsk/bmp5 ip async g zfs-pool tor2.flt2 /dev/rdsk/c3t6d0s0 /dev/md/rdsk/bmp6 mtl2.flt2 /dev/rdsk/c3t6d0s0 /dev/md/rdsk/bmp6 ip async g zfs-pool tor2.flt2 /dev/rdsk/c3t7d0s0 /dev/md/rdsk/bmp7 mtl2.flt2 /dev/rdsk/c3t7d0s0 /dev/md/rdsk/bmp7 ip async g zfs-pool tor2.flt2 /dev/rdsk/c4t1d0s0 /dev/md/rdsk/bmp8 mtl2.flt2 /dev/rdsk/c4t1d0s0 /dev/md/rdsk/bmp8 ip async g zfs-pool 2. Bitmaps are on the mirrored metadevice, they are bigger than you mentioned but this is what dsbitmap shows for volumes: bmp0: Soft Partition Device: d100 State: Okay Size: 100441 blocks (49 MB) Extent Start Block Block count 0 34 100441 3. Network: tor2.flt2 -- freebsd router ---mtl2.flt2 4. Latency: r...@tor.flt:# ping -s mtl2.flt2 PING 172.0.5.10: 56 data bytes 64 bytes from mtl2.flt2 (172.0.5.10): icmp_seq=0. time=16.822 ms I'm emulating on Freebsd the actual delay and the speed for the real circuit which is 100Mb and 16ms. 5. The queue during writes with the speed 40Mbite/s on the main host: r...@tor.flt:/# kstat sndr::setinfo | grep async_block_hwm async_block_hwm 1402834 async_block_hwm 1402834 async_block_hwm 1402834 async_block_hwm 1402834 async_block_hwm 1402834 async_block_hwm 1402834 async_block_hwm 1402834 async_block_hwm 1402834 async_block_hwm 1402834 The problems: 1. In replication mode data transmission on freebsd is only 2.5 Mbite/s for rcp traffic which is quite lower the numbers netio
Re: [storage-discuss] AVS - SNDR: Recovery bitmaps not allocated
Jim Dunham wrote, On 04/24/2009 06:46 PM: Roman, Thanks for your help! Actually I have some more questions. I need to make a decision on replication mode for our storages: zfs send-receive, avs or even microsoft internal tool on the iscsi volumes with independent zfs snashots on both side. Initially avs seemed to me a good options, but I can't make it working on 100Mb link with 8x1.36Tb volumes It's strange that sndr sends entire bitmap - what if one is for a big replicated volumes like for 1.36Gb? It's more that 10 blocks for async replication, . There will be constant timeouts on average 100M link in this case. SNDR does not replicate the bitmap volume, just the bitmap itself. There is one bit per 32KB of primary volume size, with 8 bits per byte, and 512 bytes per block. The answer for 1.36GB is just 11.04 blocks, or 5.5KB. Of course looking at the example below, the math for replicating TBs verses GBs is 1024 times larger. 1.36 TB * (1 bit / 32KB) * (8 bytes / bit) * (1 block / 512 bytes) = 11161 blocks, which is the value reported below for non-disk queue replication with SNDR. But the dsbitmap shows 100441 blocks for async replication, I'm missing something? Yes you did, the words disk queue. When replicating with a disk queue, there is an addition requirement for storing 1-byte or 4-byte reference counter per bit. These reference counters are separate from the actual bitmap, and are not exchanged between SNDR primary and secondary nodes. Required bitmap volume size: Sync replication: 11161 blocks Async replication with memory queue: 11161 blocks Async replication with disk queue: 100441 blocks Async replication with disk queue and 32 bit refcount: 368281 blocks Good. I kind of figure that this was the problem. What are is your SNDR primary volume size? After initial sync started to work (although it's a very slow process and it takes 10-15 mins to complete) I have the following situation: Below you made reference to using 'ping' with a result of 64 bytes from mtl2.flt2 (172.0.5.10): icmp_seq=0. time=16.822 ms. It would be interesting to know the results of ping -s mtl2.flt2 8192, where 8192 is the chunk size for exchanging bitmaps. The same with this size in both sides. The reason I mention this is that 64 bytes / 16.822 ms is ~3.7 KB/sec. With 8 * 11161 blocks * 512 bytes, it would take ~25 minutes to exchange bitmaps with the level of link latency and constrained bandwidth your are testing with. It should have at least the same speed as for replication itself: 2.5-3 Mbite/s But definitely the latency causes the problem, with low latency it initializes very quick. 1. Storage (8x1.36Tb in one raidz2 pool) r...@tor.flt# sndradm -i tor2.flt2 /dev/rdsk/c3t0d0s0 /dev/md/rdsk/bmp0 mtl2.flt2 /dev/rdsk/c3t0d0s0 /dev/md/rdsk/bmp0 ip async g zfs-pool . tor2.flt2 /dev/rdsk/c3t7d0s0 /dev/md/rdsk/bmp7 mtl2.flt2 /dev/rdsk/c3t7d0s0 /dev/md/rdsk/bmp7 ip async g zfs-pool tor2.flt2 /dev/rdsk/c4t1d0s0 /dev/md/rdsk/bmp8 mtl2.flt2 /dev/rdsk/c4t1d0s0 /dev/md/rdsk/bmp8 ip async g zfs-pool When you state the initial sync takes 10-15 minutes to complete, what did you do to measure this 10-15 minutes? Do you know that when using I/O consistency groups, one can also mange all the replicas in groups with a single -g group command like: sndradm -g zfs-pool -nu Yes, I did replication for I/O group everywhere. 2. Bitmaps are on the mirrored metadevice, they are bigger than you mentioned but this is what dsbitmap shows for volumes: Not an issue, SNDR ignores what it does not need. bmp0: Soft Partition Device: d100 State: Okay Size: 100441 blocks (49 MB) Extent nbsp; Start Blockn bsp; Block count 034 ; 100441 3. Network: tor2.flt2 -- freebsd router ---mtl2.flt2 4. Latency: r...@tor.flt:# ping -s mtl2.flt2 PING 172.0.5.10: 56 data bytes 64 bytes from mtl2.flt2 (172.0.5.10): icmp_seq=0. time=16.822 ms I'm emulating on Freebsd the actual delay and the speed for the real circuit which is 100Mb and 16ms. See comments above. 5. The queue during writes with the speed 40Mbite/s on the main host: r...@tor.flt:/# kstat sndr::setinfo | grep async_block_hwm async_block_hwm nbs p; 1402834 . async_block_hwm nbs p; 1402834 SNDR's memory and disk queues are adjustable. The two commands are: sndradm [opts] -F maxqfbas [set] set maximum fbas (blocks) to queue sndradm [opts] -W maxwrites [set] set maximum writes (items) to queue These commands set the high-water marks for both number of blocks and number of items in the memory queue. These are high-water marks, not hard stops, so it is possible for SNDR to exceed these values base on current in progress I/Os. maxqfbas2500 maxqitemsnbsp; 16384 I see that
Re: [storage-discuss] AVS - SNDR: Recovery bitmaps not allocated
The main problem with avs is lack of logging information. It responds with short messages about failed something and you have no idea where to look and what it's related to... Strange soft really... -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] AVS - SNDR: Recovery bitmaps not allocated
A weird issue: 1. avs works for connections on a local switch via local freebsd router connected to the switch host1 - switch - router freebsd - switch - host2 2. When trying to emulate replication using far distance remote connection with the freebsd router on the remote side then AVS fails with the error: [b]sndradm: warning: SNDR: Recovery bitmaps not allocated [/b] Full replication nevertheless works in this case, so there are absolutely no problems with network I assume. I tried to trace start of sndradm with truss, because there is no obvious reason for me why it fails. Network ok, name resolution is ok, rpc responds. What I can see is only the following difference in traces for the sndradm -nE command when replication locally and remotely: getpid()= 3245 [3244] | getpid() = 3315 [3314] fcntl(5, F_SETLKW, 0x08046608)= 0 | fcntl(5, F_SETLKW, 0x08046608) (sleeping...) lseek(5, 0, SEEK_SET) = 0 | Stopped by signal #24, SIGTSTP, in fcntl() read(5, I G A M, 4)= 4 | Received signal #25, SIGCONT, in fcntl() [default] lseek(5, 0, SEEK_SET) = 0 | siginfo: SIGCONT pid=2426 uid=0 read(5, I G A M\f\0\0\0 f #EF I.., 148) = 148 |[b] fcntl(5, F_SETLKW, 0x08046608) [/b] (sleeping...) read(5, C : s c m . t h r e a d.., 16384)= 16384 read(5,1 2 8 6 4 - -.., 36) = 36 lseek(5, 2097116, SEEK_CUR)= 2113684 read(5,1 2 8 6 4 - -.., 36) = 36 lseek(5, 2097116, SEEK_CUR)= 4210836 read(5, 01\0\0\01D\0\0\001\0\0\0.., 524288) = 524288 fcntl(5, F_SETLKW, 0x08046608) fails? Or is this something else? Thanks, Roman -- This message posted from opensolaris.org ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] subsribe
-- Best regards, Roman Naumenko *Network Administrator *ro...@frontline.ca mailto:ro...@frontline.ca 25 Adelaide Street East | Suite 600 | Toronto, Ontario | M5C 3A1 Helpdesk: (416) 637-3132 www.frontline.ca http://www.frontline.ca ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss
[storage-discuss] storageTek on opensolaris: err 26 RPC: Couldn't make connection
Hello, I'm trying to use opensolaris and zfs to create replicated storage on x86 platform. Initially I wanted to use send/recv command to replicate data but it appeared there is no descent scripts. And there are no resources to write our own, anyway it gonna be as primitive as other's I found. So, the solution is StorageTek. And the problem is StorageTek too, especially on Opensolaris. I run into know issue with installation AVS packages by predefine order http://defect.opensolaris.org/bz/show_bug.cgi?id=5115 http://www.opensolaris.org/jive/thread.jspa?messageID=307817#307817 In the beginng the rpc connection couldn't go up: serv2# rpcinfo -p serv1 rpcinfo: can't contact portmapper: RPC: Authentication error; why = Failed (unspecified error) After googling a bit I changed config/local_only boolean false for network/rpc/bind:default Host can see each other but the rdc service still doesn't work The error I'm getting is: NOTICE: SNDR client: err 26 RPC: Couldn't make connection Apr 3 03:10:15 serv1 pseudo: [ID 129642 kern.info] pseudo-device: rdc0 Apr 3 03:10:15 serv1 genunix: [ID 936769 kern.info] rdc0 is /pseudo/r...@0 Apr 3 04:27:36 serv1 rdc: [ID 517869 kern.info] @(#) rdc: built 20:59:41 Oct 1 2008 Apr 3 04:27:36 serv1 pseudo: [ID 129642 kern.info] pseudo-device: rdc0 Apr 3 04:27:36 serv1 genunix: [ID 936769 kern.info] rdc0 is /pseudo/r...@0 Apr 3 04:27:51 serv1 pseudo: [ID 129642 kern.info] pseudo-device: sdbc0 Apr 3 04:27:51 serv1 genunix: [ID 936769 kern.info] sdbc0 is /pseudo/s...@0 Apr 3 04:27:53 serv1 sv: [ID 173014 kern.info] sv: rdev 0x320440, nblocks 2925489887 Apr 3 04:27:53 serv1 sv: [ID 173014 kern.info] sv: rdev 0x3201c1, nblocks 200882 Apr 3 04:27:53 serv1 pseudo: [ID 129642 kern.info] pseudo-device: ii0 Apr 3 04:27:53 serv1 genunix: [ID 936769 kern.info] ii0 is /pseudo/i...@0 Apr 3 04:27:57 serv1 sv: [ID 173014 kern.info] sv: rdev 0x3201c0, nblocks 0 Apr 3 04:28:33 serv1 rdc: [ID 643393 kern.notice] NOTICE: SNDR: Interface 0::100 == 0: : Up Apr 3 04:28:42 serv1 rdc: [ID 153032 kern.notice] NOTICE: SNDR client: err 26 RPC: Couldn't make connection Apr 3 04:28:52 serv1 last message repeated 1 time Apr 3 04:28:53 serv1 rdc: [ID 643393 kern.notice] NOTICE: SNDR: Interface 0::100 == 0: : Down This is the messages I got after all reinstalls for every avs package. Actually, if installed them as explained in bug report above, then I'm not able to initialize database by dscfgadm -e, it complains about unsatisfied dependency for the service nws_scm After svcadm restart svc:/system/nws_scm:default it starts. But everything ends with RPC error. Servers can see each other: serv2: r...@serv2:~# rpcinfo -T tcp serv1 100143 program 100143 version 5 ready and waiting program 100143 version 6 ready and waiting program 100143 version 7 ready and waiting r...@serv2:~# rpcinfo -T tcp serv2 100143 program 100143 version 5 ready and waiting program 100143 version 6 ready and waiting program 100143 version 7 ready and waiting serv1: r...@serv1:~# rpcinfo -T tcp serv2 100143 program 100143 version 5 ready and waiting program 100143 version 6 ready and waiting program 100143 version 7 ready and waiting r...@serv1:~# rpcinfo -T tcp serv1 100143 program 100143 version 5 ready and waiting program 100143 version 6 ready and waiting program 100143 version 7 ready and waiting What else can I check in this case? Seems like running avs on opensolaris is pretty bad idea, so I started to download Solaris... Thanks, Roman ___ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss