Re: [storage-discuss] iSCSI and low performance

2009-12-26 Thread Roman Naumenko
 Hello,
 I'm having some issues with iSCSI target performance.
 Recently I made a 1TB ZVOL, mounted it on windows 7
 Ultimate (NTFS) with Microsoft's iSCSI initiator. But
 the performance, in layman's terms, just sucks.
 
 Version is SunOS solaris 5.11 snv_123 i86pc i386
 i86pc
 
 Athlon64 2800+, 4x500GB SATA2 drives (WD Caviar
 Green) in raidz, 3GB RAM. Onboard nvidia sata and
 gigabit ethernet.

Amount of memory is rather low for zfs. 

 So, two questions:
 1) Why does the shareiscsi=on option create the
 target with rdsk if it is much slower?
 2) Any suggestions for improving performance?
 
 Thanks in advance,
 Hernan

shareiscsi is outdated, you should use much advanced and improved COMSTAR. 
Don't forget to check if writeback is enabled on LUN - writethrough mode is too 
slow usually. 

--
Roman 
ro...@naumenko.ca
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] iSCSI and low performance

2009-12-26 Thread Roman Naumenko
 Thanks. How much memory should I have? This machine
 wont take more than 3GB, best I can get is one that
 takes up to 8GB. Anyway, I'm the only user, is it
 really necessary?

It depends on your needs. If you are ok with the current performance, you can 
go ahead I guess. 
Reads would do better since there is more space for cache.

 Tried COMSTAR as you suggested, got the same speed
 and behavior as iscsitadm with dsk instead of rdsk.
 This is what writes look like:
 
 [r...@solaris:~]# zpool iostat 3
capacity operationsbandwidth
 ed  avail   read  write   read  write
 --  -  -  -  -  -  -
 tera2.13T  1.49T 30263   238K  2.25M
 --  -  -  -  -  -  -
 tera2.13T  1.49T  0  6.42K  0  51.0M
 --  -  -  -  -  -  -
 tera2.13T  1.49T  0  0  0  0
 --  -  -  -  -  -  -
 tera2.13T  1.49T  0  0  2.64K  0
 --  -  -  -  -  -  -
 tera2.13T  1.49T  0  8.13K  0  64.6M
 --  -  -  -  -  -  -
 tera2.13T  1.49T  0  0  2.64K  0
 --  -  -  -  -  -  -
 tera2.13T  1.49T  0  0  2.65K  0
 --  -  -  -  -  -  -
 tera2.13T  1.49T  0  8.23K  0  65.4M
 --  -  -  -  -  -  -
 
 if I disable writeback, it looks like this:
 
 tera2.13T  1.49T  0 44  0  2.89M
 --  -  -  -  -  -  -
 tera2.13T  1.49T  0 35  2.66K  2.29M
 --  -  -  -  -  -  -
 tera2.13T  1.49T  0 46  0  3.01M
 --  -  -  -  -  -  -
 tera2.13T  1.49T  0 22  0  1.48M
 --  -  -  -  -  -  -
 tera2.13T  1.49T  0 47  0  3.06M
 --  -  -  -  -  -  -
 tera2.13T  1.49T  0 22  0  1.45M
 --  -  -  -  -  -  -
 tera2.13T  1.49T  0 40  0  2.60M
 --  -  -  -  -  -  -
 
 So why is it so bursty? ZIL maybe? Do I have a
 bottleneck somewhere? 

It's a correct behavior.
I think this is like a limit for your current configuration.

There is a thread in zfs forum about this burst 
(http://www.opensolaris.org/jive/thread.jspa?messageID=446100tstart=0#446100)

You can decrease timeout for writes, but I doubt it will help. Probably the 
array is not able to do better (check the maximum throughput with dd command).

You can also check iscsi speed by using this scipts (iscsii.d) from 
http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_iSCSI
Basically, you'll see the same speed as on windows task manager in Network tab.

You can definitely increase array performance by configuring it in raid10.

--
Roman 
ro...@naumenko.ca
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Even with host groups, other inititiators still see the targ

2009-12-13 Thread Roman Naumenko
If I remember correctly, initiator that is not in a view still can establish 
session, but it doesn't get access to LUNs. Particularly, you can login to a 
target from Windows, you will see the session opened in Solaris, but Win box 
won't see any iscsi drive.

There is a bug related to your question:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6878539

--
Roman Naumenko
ro...@naumenko.ca

Message was edited by: rokka
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] failover avs?

2009-12-04 Thread Roman Naumenko
Hi Greg,

Your setup is not very clear from the description... But probably you should 
try snapshots for data syncronization. 
iscsi targets have to be recreated in any case on the second server.

--
Roman Naumenko
ro...@naumenko.ca
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] (warning) packet length greater than MTU in buffer offset 8320: length=8320

2009-11-20 Thread Roman Naumenko
I have a problem on storage server related to network.

This is on OpenSolaris 121b, on the other side is Windows file server accessing 
it through iscsi. When a user is hammering storage server by copying files back 
and forth, other users connected to Windows file server are experiencing 
slowness.

r...@torgenzsan.local:/export/home/roman/zfs/iscsi_watch# dladm show-ether
LINKPTYPESTATEAUTO  SPEED-DUPLEXPAUSE
e1000g0 current  up   yes   1G-fbi
e1000g1 current  up   yes   1G-fbi
e1000g2 current  up   yes   1G-fbi

dladm show-link
LINKCLASSMTUSTATEOVER
e1000g0 phys 1500   up   --
e1000g1 phys 1500   up   --
e1000g2 phys 1500   up   --

ifconfig -a
lo0: flags=2001000849UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL mtu 8232 
index 1
inet 127.0.0.1 netmask ff00 
e1000g0: flags=1000843UP,BROADCAST,RUNNING,MULTICAST,IPv4 mtu 1500 index 2
inet 10.24.1.101 netmask ff00 broadcast 10.24.1.255
ether 0:15:17:87:c9:bc 
e1000g1: flags=1000843UP,BROADCAST,RUNNING,MULTICAST,IPv4 mtu 1500 index 3
inet 10.24.254.101 netmask ff00 broadcast 10.24.254.255
ether 0:15:17:87:c9:bd 
e1000g2: flags=1000843UP,BROADCAST,RUNNING,MULTICAST,IPv4 mtu 1500 index 4
inet 10.21.3.101 netmask  broadcast 10.21.255.255
ether 0:11:a:54:6e:66 

snoop -r -o 20112009.cap -d e1000g0

29135 (warning) packet length greater than MTU in buffer offset 0: length=8320
29136 (warning) packet length greater than MTU in buffer offset 8320: 
length=8320
29137 (warning) packet length greater than MTU in buffer offset 16640: 
length=8320
29138 (warning) packet length greater than MTU in buffer offset 24960: 
length=8320
29139 (warning) packet length greater than MTU in buffer offset 33280: 
length=8320
29141 (warning) packet length greater than MTU in buffer offset 41688: 
length=8320
29142 (warning) packet length greater than MTU in buffer offset 50008: 
length=8320
29160 (warning) packet length greater than MTU in buffer offset 0: length=8320
29163 (warning) packet length greater than MTU in buffer offset 8496: 
length=8320
29180 (warning) packet length greater than MTU in buffer offset 18304: 
length=8320
29181 (warning) packet length greater than MTU in buffer offset 26624: 
length=8320
29182 (warning) packet length greater than MTU in buffer offset 34944: 
length=8320
29183 (warning) packet length greater than MTU in buffer offset 43264: 
length=8320
29184 (warning) packet length greater than MTU in buffer offset 51584: 
length=8320

And so on...

In Wireshark packets sizes go up to 56741 (49927,41687,33495,16697,9382)
Among bigger packages mostly 8294 seen.

What I'm thinking about is that Intel Intel I/OAT engine can help screwing 
things up - e1000g0 is Intel one. (I remember there was a command to check BIOS 
setting).

If anybody is willing to take a look on trace file, please let me know at 
ro...@frontline.ca

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] COMSTAR: lun become unregistered after power loss

2009-11-16 Thread Roman Naumenko
This is opensolaris 118b

We had  power loss on the server. After it booted, one of the LUNs was in 
unregistered state. Is this a bug?

stmfadm list-lu -v
LU Name: 600144F0EE6002004AA868040001
Operational Status: unregistered
Provider Name : unregistered
Alias : -
View Entry Count  : 1

Views were also mess, since the numeration for LUNs shifted. 
When I created views I put -n X in the stmfadm add-view command. Could it 
affect how targets are mapped to LUNs?

zsan.local:~# stmfadm list-view -l 600144f0ee6002004aa86b0a0002
View Entry: 0
Host group   : zsan-hg
Target group : tg-tor-fs
LUN  : 2
zsan.local:~# stmfadm list-view -l 600144f0ee6002004b0027ba0001
View Entry: 0
Host group   : zsan-hg
Target group : tg-mtl-fs
LUN  : 3
zsan.local:~# stmfadm list-view -l 600144f0ee6002004ab7e2810004
View Entry: 0
Host group   : zsan-hg
Target group : tg-mon-exch-01
LUN  : 2

2 views have the same LUN, number 2, although the GUID is different. Is this 
correct configuration?

Thank you,
Roman Naumenko
ro...@frontline.ca
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] COMSTAR: lun become unregistered after power loss

2009-11-16 Thread Roman Naumenko
Ok, thanks - with targets behavior it's clear. Nothing unexpected happened in 
term of connectivity.

Still, what could make LUN become unregistered? 
Is this because of an error on zfs level (no errors on zpool though) or 
something failed in COMSTAR?

--
Roman Naumenko
ro...@frontline.ca
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] [sun.com] broken link

2009-10-20 Thread Roman Naumenko
You guys have a broken link, people might be interested in X25 :)
The page is http://www.sun.com/software/x25/

The link is:

http://store.sun.com/CMTemplate/CEServlet?process=SunStorecmdViewProduct_CPcatid=95146

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Comstar - how to migrate to the new server?

2009-10-20 Thread Roman Naumenko

Jim Dunham wrote, On 20.10.2009 09:26:

Do you have a backup policy, other than replicating the data from server1 to 
server2?
  
Of course we have this data in the other places. Windows replicates it 
with own means plus tape backups.
I wouldn't touch it though, if not performance issues. I haven't try 
disabling AVS yet. But raid10 is the best option definitely.

Are server1 and server2 identical systems, such that if you configured 
server2's ZFS storage pool to raid10, then using ZFS (or AVS) replicated the 
data from server1 to server2, could the identity of server1 and server2 be 
swapped?
  
They are identical in terms of underlaying storage (same controller, the 
same drives). The server configuration is slightly different.

My concern is that in any type of data conversion like this, to always have two 
or more locations where valid data exists.
  
When swapping drives the only one location is presented. That's the 
problem, yes.


--
Roman
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] Comstar - how to migrate to the new server?

2009-10-19 Thread Roman Naumenko
Hello list,

Can somebody advise me how this can be done in the easiest way:

server1: 8drives in raid6 
server2: 8drivers replicated by AVS from server1. 

There are comstar targets on server1. Now I want to have raid10 insted raid6 on 
server1. 

The way I see it can be done: brake avs, configure raid10 on server2, copy 
incremental  snapshots and then pause applications on initiators. Reboot 
server1 while swapping drives between them. 

The question now is how to restore (if even possible) comstar stuff: LUNs, 
views. It would be nice to have server1 that comes up after reboot with 
everything available as before.
 
If this is possible or comstar should be reconfigured?

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Comstar - how to migrate to the new server?

2009-10-19 Thread Roman Naumenko

Hi Jim,

AVS is configured for async replication.

Basically I would like to replace AVS with snapshots transfer. The 
reason is the performance.
So, after I stop replication and configure raid10 on the second server, 
AVS going to be disabled.


Both servers are 122 version (which should be updated to the 124 due to 
the zfs bug).

There is plenty of space left on the server 1, raid6 - raid10 should be ok

#zpool list
stor  10.9T  3.66T  7.21T33%  ONLINE  -
 pool: stor
state: ONLINE
scrub: none requested
config:

   NAMESTATE READ WRITE CKSUM
  stor ONLINE   0 0 0
 raidz2ONLINE   0 0 0
   c3t0d0  ONLINE   0 0 0
   c3t1d0  ONLINE   0 0 0
   c3t2d0  ONLINE   0 0 0
   c3t3d0  ONLINE   0 0 0
   c3t4d0  ONLINE   0 0 0
   c3t5d0  ONLINE   0 0 0
   c3t6d0  ONLINE   0 0 0
   c3t7d0  ONLINE   0 0 0
   spares
 c4t1d0AVAIL  


errors: No known data errors

#zfs list

stor6.09T  1.91T  49.4K  /stor
stor/testsnap46.0K  1.91T  46.0K  /stor/testsnap
stor/tor  6.09T  1.91T  44.9K  /stor/tor
stor/tor/node1  600G  1.91T   600G  -
stor/tor/node2  600G  1.91T   600G  -
stor/tor/fs  4.43T  4.90T  1.35T  -
stor/tor/dc 500G  2.28T   128G  -

--
Roman

Jim Dunham wrote, On 19.10.2009 14:07:

Roman,


Hello list,

Can somebody advise me how this can be done in the easiest way:

server1: 8drives in raid6
server2: 8drivers replicated by AVS from server1.

There are comstar targets on server1. Now I want to have raid10 
insted raid6 on server1.


The way I see it can be done: brake avs, configure raid10 on server2, 
copy incremental  snapshots and then pause applications on 
initiators. Reboot server1 while swapping drives between them.


The question now is how to restore (if even possible) comstar stuff: 
LUNs, views. It would be nice to have server1 that comes up after 
reboot with everything available as before.


If this is possible or comstar should be reconfigured?


As one that has detailed knowledge of both AVS and COMSTAR, you have 
left out too many details as to offer up a possible solution that will 
work. What is needed is Solaris version numbers, volume management 
information, on disk formatting, partitioning and filesystems in use, 
plus AVS and COMSTAR configuration information. 

Also changing the underlying volume format from raid6 to raid10, is 
likely to result in less available disk blocks, so a block-to-block 
copy, replication or snapshot from a raid6 to raid10 volume is likely 
to fail due to insufficient storage.


--
Roman
--
This message posted from opensolaris.org http://opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org mailto:storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


*
*Jim Dunham *
Engineering Manager
Core I/O - COMSTAR
Open Storage Systems Group

*
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] sata controller support with 1.5TB disks

2009-10-16 Thread Roman Naumenko
 Before I order the controller and disks, I just want
 to make sure that Opensolaris will be able to see and
 use these 1.5TB disks.  I don't want to find out that
 the controller works, but it can't see this large
 sized disk.
 
 Can anyone enlighten me please ?

Cheap models from Adaptec don't support large drives, I encountered in this 
issue.  
LSI is a good choice, probably everything from them that are on PCI-e supports 
large drives.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] sata controller support with 1.5TB disks

2009-10-16 Thread Roman Naumenko
 I am using a Dell SAS 6i card with the Samsung 1.5Tb
 drives, which is based on an LSI design.
 I would recommend the SAS 6i as it is significantly
 cheaper than LSI's normal retail channel cards.

Don't Dell screw them up? 
I don't mean completely, just a little - broken driver here, unsupported 
hardware there?

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] IOPS number - how much your storage delivers?

2009-10-09 Thread Roman Naumenko
I just wonder how much IOPS would typical 2 processor box can deliver with 1 
RAID and 1 LSI HBA controller with JBODs connected.

Now dd makes 2000-4000 IOSP on 8 disks raid10 array connected internally (not 
jbod).

How to translate iops number into initiator usage? I mean how many servers can 
be connected on 1G NICs to such storage? 
Or how many exchange server can utilize such storage, accessing through 1G nic 
over iscsi?

Just trying to plan storage and performance capacity ahead.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] Comstart - relationship between luns and targets

2009-10-09 Thread Roman Naumenko
Stupid situation...

Usually I create 1 lun + 1 target at time. But then I decided to create few 
LUNs.

Then I typed:
itadm create-target

It created a target, but then I realized I can't find which LUNs it belongs.
Does anybody knows how to find relationship LUNs-Targets in comstart?

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Comstart - relationship between luns and targets

2009-10-09 Thread Roman Naumenko
I see the relation: only after the view is created, you'll find in the stmfadm 
list-lu -v
that the lun has a view.

Very inconvenient.

Comstar developers, could you comment on this? Maybe I'm missing something?

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Comstart - relationship between luns and targets

2009-10-09 Thread Roman Naumenko

tim szeto wrote, On 09.10.2009 13:51:

Roman Naumenko wrote:
  

I see the relation: only after the view is created, you'll find in the stmfadm 
list-lu -v
that the lun has a view.

Very inconvenient.

Comstar developers, could you comment on this? Maybe I'm missing something?
  


Take a look at stmfadm(1M), it shows the usage of 'view'.
  

There is no view yet
I can configure any view without knowing which target belong each volume.

--
Roman
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] Migration from Linux iscsitarget to Comstar, how?

2009-10-01 Thread Roman Naumenko
We're about to start transferring data from Linux storage boxes to OpenSolaris 
storage. Linux servers provide targets that are used mainly by Windows servers. 
The storage appliance is Openfiler (Linux based).

Media storage on Linux is a hardware raid5, 8 drives. There is one big 5TB GPT 
partition divided into volumes by LVM (and openfiler uses LVM to manage size 
and so on).

Is there is an easy way to transfer data from old volumes to comstar targets? 
Obviously this is can be done by mounting two targets and copying data on 
client, which is not efficient.

The volumes can be mounted on opensolaris over iscsi, no problems here, but how 
the data can be transferred from the old volume to zfs?
  
--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Migration from Linux iscsitarget to Comstar, how?

2009-10-01 Thread Roman Naumenko
  Is there is an easy way to transfer data from old
 volumes to comstar targets? Obviously this is can be
 done by mounting two targets and copying data on
 client, which is not efficient.
 
 I'm not sure you have an easy way to do this. The
 problem as I see it is the LVM'd volumes. You have a
 couple of ways of doing this, but by far the easiest
 way is to mount the new LUN on the server and copy
 the data. It will involve the least downtime and
 ensures that your Comstar targets are already setup
 correctly. 

Yep, seems like this is the easiest way to do switchover.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Migration from Linux iscsitarget to Comstar, how?

2009-10-01 Thread Roman Naumenko
 The other way is to mount the old LUN on the
 OpenSolaris server and dd the old data directly to the ZVOL. This will avoid
 the copy-off copy-on the network and could speed things up
 depending on the size of the volume to the amount of data being copied, but
 you can choose your block size which can speed things up.
 
 -Ross

I wonder how mounting target from Linux server works on Opensolaris? Reliable? 
Any issues?
Is any interaction possible with comstar already enabled?
 
--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Migration from Linux iscsitarget to Comstar, how?

2009-10-01 Thread Roman Naumenko
 http://blogs.sun.com/eschrock/entry/shadow_migration
 
 Not sure if it's relevant in your specific setup, but
 is still worth a look.
 
 Regards,
 Andrey

Thanks, I checked it quickly. 
unfortunately, this is about migration using nfs.  No iscsi option.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] [snapshots] takes minutes for zfs to list snapshots

2009-09-25 Thread Roman Naumenko
This is on opensolaris 118

time zfs list
real0m0.015s
user0m0.005s
sys 0m0.010s

time zfs list -t snapshot
real0m19.441s
user0m0.020s
sys 0m0.041s

time zfs list -t snapshot  | wc -l
122

real0m0.045s
user0m0.018s
sys 0m0.030s

Hm, then it started to list very quick again.

What makes it to do listing for such a long time, 20 seconds for a hundred 
snapshots? 
Does making snapshot could delay listing? (I have autosnapshot service for 30 
mins snapshots)

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] setting up comstar

2009-09-25 Thread Roman Naumenko
And here goes mine as well:

http://opensolaris.org/jive/thread.jspa?threadID=111540tstart=0

Enjoy!

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] convert common chassis to JBOD?

2009-09-23 Thread Roman Naumenko
 Chenbro makes a JBOD kit.  Or at least *made*.  A
 number of sites are showing out of stock or even
 discontinued.
 
 http://usa.chenbro.com/corporatesite/products_detail.p
 hp?sku=76
 
 UEK-12803 looks like the part number for you.
 
 Who knows if the mounting options are compatible with
 the Supermicro chassis but it looks like it should at
 least fit in the motherboard I/O window and comes
 with SAS expanders based around the LSI SASX28.

It doesn't make sense to use such external expanders since chassis with 
expander/without are only different in having small LSI chip on the backplane 
(20$ ?)

As I can see, Supermicro chassis are just little more expensive with sas 
expander on backplane that without.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] convert common chassis to JBOD?

2009-09-22 Thread Roman Naumenko

Chris Du wrote, On 23.12.-28158 14:59:



JBOD Kit-- Used for cascading purposes
CSE-PTJBOD-CB1 - Power Control Card
CBL-0166L - SAS 836EL2/EL1 BP External Cascading Cable
CBL-0167L - SAS 836EL1 BP 1-Port Internal Cascading Cable

Yours is not E1 model which uses SAS expander chip on the backplane. 
The Power Control Card should be universal.



Without sas expander's chip this kit is useless as I understand.

What are other solution for JBOD expanders except for chassis with 
backplanes sas chips like you mentioned?


I see that Areca sells something (ARC-8020 expander module).
Well, I suspect  the only solution here is a specialized jbod chassis, 
right?


--
Roman
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] convert common chassis to JBOD?

2009-09-22 Thread Roman Naumenko
 On Tue, Sep 22, 2009 at 15:15, Chris Du
 dilid...@gmail.com wrote:
  I thought QT model has 16 SATA/SAS ports on the
 backplane and only E1/E2 model supports JBOD mode.
 Sorry, my mistake.  I have the E1 model, and thought
 that Roman had mentioned he had that as well.  I see now in the
 first post that he in fact specified the no-expander version of the case.
  I'll try to read ore carefully in the future.

That's ok, I thought myself that it has.

 The JBOD power control board will suffice to turn the
 box on, but you will need to do one of three things:
 1) find an external SAS expander card like Chenbro
 makes (but good luck finding them :( )
 2) exchange your case for the SAS expander version
 3) Get lots of SAS controllers and connect disks one
 to a port.
 
 Will

Yes, I've already figured that out, available options are not very nice. 

Shit, this is a conspiracy - why nobody makes that freaking sas expanders?
I know why - a card would cost 100$, a chassis JBOD costs 1500$

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] convert common chassis to JBOD?

2009-09-22 Thread Roman Naumenko
 It may be cheaper and easier to just replace the backplane if the case is 
 already bought.

Is it an easy procedure? I doubt 
--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] convert common chassis to JBOD?

2009-09-21 Thread Roman Naumenko

Chris Du wrote, On 23.12.-28158 14:59:

You need JBOD kit. It's basically a power card and a SAS cascading cable.
  

Hi Chris,

Do you have any particular in mind?

--
Roman
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] convert common chassis to JBOD?

2009-09-21 Thread Roman Naumenko

Ok, I see. We've ordered that cards already.

But anyway thanks.

--
Roman

Chris Du wrote, On 21.09.2009 17:25:


Yours is different as mine. I have the E1 model  which use SAS
expender chip on backplane. However, the power card should be same.

http://www.supermicro.com/products/chassis/3U/836/SC836E1-R800.cfm

Look at optional parts in the bottom, there is JBOD kit that inclues
power control card.


On Mon, Sep 21, 2009 at 2:13 PM, Roman Naumenko ro...@frontline.ca 
wrote:

 Chris Du wrote, On 23.12.-28158 14:59:

 You need JBOD kit. It's basically a power card and a SAS cascading 
cable.



 Hi Chris,

 Do you have any particular in mind?

 --
 Roman


___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] convert common chassis to JBOD?

2009-09-17 Thread Roman Naumenko
Yes, that's what I needed.
Thank you very much!

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Making zfs storage HA clustered

2009-09-14 Thread Roman Naumenko
 From: Roman Naumenko ro...@frontline.ca
  Is there are any way to build HA storage using
 common components like 
  JBOD enclosures, lsi hba, cheap sata drivers? 
 
 Yes.
 
 http://www.sun.com/storage/disk_systems/unified_storag
 e/index.jsp

Yes, that the right option when going cheap :)

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Making zfs storage HA clustered

2009-09-11 Thread Roman Naumenko
 [Added ha-clusters-discuss]
 
 Have you looked at what Open HA Cluster(OHAC)
 provides?
 
 http://opensolaris.org/os/community/ha-clusters/
 
 There is a HA-ZFS agent for OHAC and more recently
 support for shared-nothing storage with COMSTAR.
 
 Augustus.

I can't find this. Is there any documentation or implementation process 
described? Or it's just a development version?

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Making zfs storage HA clustered

2009-09-11 Thread Roman Naumenko
Augustus, thank you for the information.

Ok, so it is called Shared Nothing Storage to make storage server HA. 
And it should be build with MPxIO, as I've learned from your blog.

Is there are any way to build HA storage using common components like JBOD 
enclosures, lsi hba, cheap sata drivers? 

--
Roman Naumenko
ro...@frontline.ca
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Making zfs HA

2009-09-10 Thread Roman Naumenko
I've heard about Nexenta. 
But we're just started to move away from another linux-build appliance, so I 
don't feel like start using another Linux once again.

Hopefully, sometime we'll just order proper hardware from SUN to provide 
clustering or whatever is needed.

Anyway, thanks for suggestions. 
Is it possible to use the code of plugins you've mentioned in Opensolaris? 
Anyway, it's still not a true clustering solution, but sounds promising. 

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] [HBA card] is needed

2009-09-10 Thread Roman Naumenko
Hi guys,

It's kind of emergency, a customer want a fast, expandable array but I don't 
have any HBA to connect jbods. 

I've decided to use lsi sas 3801e card.
http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/lsisas3801e/index.html

There is positive feedback for this card when used with JBOD enclosures. 
However, our sales dep can't find it fast enough (back order is everywhere).

Can anybody advise something similar? I know many vendors use lsi chipset, but 
for example Dell cards have some firmware issues when installed on Solaris (or 
not?)

Thanks in advance.
--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] [HBA card] is needed

2009-09-10 Thread Roman Naumenko
 I have two of these sat on my desk right now - order
 them on overnight delivery last week no problem.
 
 What country are you in?
 
 Regards,
 
 Tim Creswick

Hi Tim,

I'm in Canada. 

I wonder what was the price for this card? 
And seems like we found something, guys from pc-pitstop.com are promising to 
ship what we need promptly.

--
Roman
ro...@frontline.ca
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Making zfs storage HA clustered

2009-09-10 Thread Roman Naumenko
I wonder what opensolaris guru could advise?

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] Making zfs HA

2009-09-09 Thread Roman Naumenko
Hello list,

What are the options for building clustering storage using opensolaris? I'm 
interested in HA solutions.

I tried only one option - AVS replication. Unfortunately, AVS configuration is 
too complicated. Groups, bitmaps, queues, rcp timeouts, slicing - it's just a 
nightmare to make it work and support in production when there are more that a 
couple of pools. And probably it's gonna be slow if there are jbods attached to 
a storage controller, or will require half of Ethernet ports to replicated more 
or less reliably. 10Gige interface for it - did anybody try? 

Another option I'm looking into is sending snapshots. But regardless what we've 
heard from sun - it's gonna slow down zfs operations. Creating a snapshot is 
not quick on loaded pool, especially if storage controller manages many pools. 
So, it's probably dozens of minutes delays in transferring snapshots over to a 
standby server.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] jbos enclosures: is online expansion possible (without rebooting)?

2009-09-08 Thread Roman Naumenko
 On Tue, Sep 8, 2009 at 10:43 AM, Roman
 Naumenkoro...@frontline.ca wrote:
  Thanks, Ross.
 
  Just to clarify: connecting the second enclosure
 doesn't require the first to be turned off?
  My understanding is that expanding pool can be done
 completely without service interruption?
 
 You can hot-add storage enclosures that support
 hot-adding if the controllers support this as well, the MD1000s from
 Dell support this as do the LSI 1068 and 1078 based controllers.

Excellent! 
Just another step to 7410 under 10 grand :)

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] [arc_summary] free memory is 2.6G of total 16G

2009-09-04 Thread Roman Naumenko
I have a question about stats output from arc_summary. Is Free Memory 
supposed to be on this level? It's almost 16% of all memory.

Maybe  Max Size (Hard Limit)  should be corrected, if 2.5G of memory remains 
free all time?

./arc_summary.pl 
System Memory:
 Physical RAM:  16369 MB
 Free Memory :  2593 MB
 LotsFree:  255 MB

ZFS Tunables (/etc/system):

ARC Size:
 Current Size: 9870 MB (arcsize)
 Target Size (Adaptive):   9870 MB (c)
 Min Size (Hard Limit):1918 MB (zfs_arc_min)
 Max Size (Hard Limit):15345 MB (zfs_arc_max)

--
Roman Naumenko
ro...@frontline.ca
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] [Comstar] listens on all interfaces, sends all targets to everybody

2009-09-01 Thread Roman Naumenko
Can anybody advise? 
Bond target port to a particular interface, restrict targets appearing per 
initiator - is this something available in Comstar or I'm missing something?

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] [Comstar] listens on all interfaces, sends all targets to everybody

2009-08-31 Thread Roman Naumenko
Sorry for repeated question, I remember somebody asked already, can't find 
when. 

First question:
1. Does a configured target with all hg,tg,tpg, initiators - everything, none 
the less makes comstar listen on all interfaces for incoming connections?

netstat -an | grep 3260
  *.3260   *.*0  0 262300  0 LISTEN

Basically, I would like to restrict connections to LUN to a particular one. The 
same for an initiator - it should not see other targets.

2. Target configured along with tpg on interface e1000g0, but I can get list of 
targets by adding Target portal discovery as e1000g0 ip on initiator. Although 
it can't login, still confusing.

Again, I'm getting the list of targets since it listens on all interfaces.

Any references to the documentation explaining this are appreciated. 

--
Roman Naumenko
ro...@bestroman.com
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Comstar: MS can't format iscsi drive.

2009-08-27 Thread Roman Naumenko
Excellent analysis. Can you add commands for tshark that you've used?
So, Opensolaris sends large packets that confuses WireShark (and probably 
Windows)?

It's intel motherboard.

$ uname -a
SunOS zsan01 5.11 snv_118 i86pc i386 i86pc Solaris

That's what I have on the server, don't know how to list NICs chipset 
/etc/drv/
Intel e1000g Gigabit Ethernet Adapter

$dladm show-linkprop | grep mtu
e1000g0  mtu rw   1500   1500   1500-9216 
[b]e1000g1[/b]  mtu rw   9216   1500   
1500-9216 

I/OAT is probably enabled, I remember there was something in BIOS. 
I'll try to disable it, but we've tested win2008 with Linux. It failed to 
format LUN on the old Linux storage server. I'll try to post that trace as well.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Comstar: MS can't format iscsi drive.

2009-08-27 Thread Roman Naumenko
 
 On Aug 27, 2009, at 1:25 PM, Roman Naumenko wrote:
 
  Excellent analysis. Can you add commands for tshark
 that you've used?
  So, Opensolaris sends large packets that confuses
 WireShark (and  
  probably Windows)?
 
  It's intel motherboard.
 
  $ uname -a
  SunOS zsan01 5.11 snv_118 i86pc i386 i86pc Solaris
 
  That's what I have on the server, don't know how to
 list NICs chipset
  /etc/drv/
  Intel e1000g Gigabit Ethernet Adapter
 
 try a scanpci -v, then to match with the driver
 that's binding check  
 the vendor and device id's in /etc/driver_aliases

Intel 82546EB Ethernet Device - from smbios

This is from scanpci -v:
 pci bus 0x0005 cardnum 0x00 function 0x01: vendor 0x8086 device 0x1096
 Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper)
 CardVendor 0x8086 card 0x3484 (Intel Corporation, Card unknown)
  STATUS0x0010  COMMAND 0x0047
  CLASS 0x02 0x00 0x00  REVISION 0x01
  BIST  0x00  HEADER 0x80  LATENCY 0x00  CACHE 0x10

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] Comstar: MS can't format iscsi drive.

2009-08-25 Thread Roman Naumenko
It's Windows 2008

If I choose quick format it gives an error almost immediately: The format did 
not complete successfully.

If it's the long format action, then it starts formtting, doing something for 
couple of hours and then fails with the same error.

list-lu:
LU Name: 600144F0A02504004A9311690002
Operational Status: Online
Provider Name : sbd
Alias : /dev/zvol/rdsk/zsan01store/mbx01-node2-test
View Entry Count  : 1
Data File : /dev/zvol/rdsk/zsan01store/mbx01-node2-test
Meta File : not set
Size  : 536870912000
Block Size: 512
Vendor ID : SUN 
Product ID: COMSTAR 
Serial Num: not set
Write Protect : Disabled
Writeback Cache   : Disabled

itadm list-target -v
TARGET NAME  STATESESSIONS 
iqn.1986-03.com.sun:02:f6265007-17cd-62aa-c31c-d2676533ae89  online   0
alias:  -
auth:   none (defaults)
targetchapuser: -
targetchapsecret:   unset
tpg-tags:   tpg-0g1 = 2

zfs list

zsan01store/mbx01-node2-test   500G  7.42T   844K  -

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Comstar: MS can't format iscsi drive.

2009-08-25 Thread Roman Naumenko
Of course by RDP and we've been doing it for a couple of years using linux 
iscsi target.
There are too many servers to format them using console.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Comstar: MS can't format iscsi drive.

2009-08-25 Thread Roman Naumenko
Windows 2008 doesn't like it. 
XP formats volume easily, it can be mounted then on 2008 over iscsi.

The only difference is that its a 2-node cluster on 2008.

--
Roman Naumenko
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Comstar: MS can't format iscsi drive.

2009-08-25 Thread Roman Naumenko
Thank Nigel,

Disabling jumbo and tso on win server allowed formatting to be done. SUN had 
unchanged config:

e1000g1: flags=1001000843UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU mtu 
9000 index 3
inet 10.10.110.101 netmask ff00 broadcast 10.10.110.255
ether 0:15:17:89:97:1 

I'm attaching traces from both sides during iscsi session: logon to target, 
online volume in Windows and do quick format.

For jumbo on mode I saw a warning while tracing on SUN:

snoop -r -d e1000g1 -o jambo_on_tso_off.cap 

Using device e1000g1 (promiscuous mode)
42817 (warning) packet length greater than MTU in buffer offset 0: length=26960
42819 (warning) packet length greater than MTU in buffer offset 27048: 
length=18000
42828 (warning) packet length greater than MTU in buffer offset 16728: 
length=27680
42830 (warning) packet length greater than MTU in buffer offset 44496: 
length=13728

File to download:
http://www.speedyshare.com/538479835.html

Careful, unpacked traces are big
--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Restricting initiator resource consumption

2009-08-21 Thread Roman Naumenko
 Tristan Ball wrote:
  I believe Zpool iostat will include cached IO's,
 and write IO's which will be coalesced into a single physical IO to your
 disk.
p
  The plain iostat command is a good place to start to see what's actually 
  going to disk. iostat -dxzcn 1 is what comes out my fingers 
  automatically, although I vary the interval, and
 often add CM the options.
p
 I believe that 'zpool iostat' also shows only what is
 going to physical disk, just like regular iostat.
p
 If you want to judge the effect of caching, compare the stats from 'fsstat 
 zfs' (or 'fsstat /mountpoint' to isolate a particular dataset) against the 
 zpool iostat numbers.  When I watch fsstat during heavy write activity, such 
 as an incoming backup, I see a steady stream of writes with fsstat, but only 
 periodic bursts of writes with zpool iostat or plain iostat.
p
Thanks, I didn't know how to look into fs access stat.
p
--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Restricting initiator resource consumption

2009-08-20 Thread Roman Naumenko
 If your pool has few vdevs (I believe you had a single raid-z), then this 
 will be slow, as you will probably get only about 100 IOP/s from your pool. 
 My exchange server happily produces 1500 IOP/s in daily use, and would go 
 higher but our current array wont go any faster for un-cached IO. :-)p

How do you check iops?/br
I saw writes  1000 in zpool iostat when I tested pool with sqlio over 
iscsi./br

But it was async, of course./br
I didn't find a way to make sync synthetic writes on windows, so I'm waiting fr 
exchange to see what kind of writes it makes./br
p
 Even with a dedicated SAN and 15K/rpm drives, MS generally recommend
 Raid-10 configurations for exchange. Raid-5/6, or RaidZ1/2 usually
 doesn't give the IOP/s rates you need - although many people do anyway.br 
p
 That's why I'd like to use SSD - to improve iops to a desirable level.  p
 Only if you get a reasonable cache hit rate. I suspect that a combination of 
 exchange and sql server is likely to result in a fairly poor hit rate - 
 certainly not enough to alleviate the performance bottleneck of a single vdev.
p
I meant slog - to improve iops for wites. Reading cache is not a big deal right 
now.

--p
Roman Naumenko/br
ro...@frontline.ca

Message was edited by: rokka
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Restricting initiator resource consumption

2009-08-19 Thread Roman Naumenko
 div id=jive-html-wrapper-div
 
 ZFS based mirroring is only slightly less reliable
 than Raid-z2, and it gives much better IOP/s.br

How it can give much better iops taking into account slow and large sata drives?

 Neither exchange or SQL Server tend to be throughput bound, but both require 
 very high IOP/s rates. I don't think you'll see enough traffic accross the 
 controller for it to be the
 bottleneck!br

Correct, there is not much traffic, but we have decreased iops for a particular 
lun affected by activity on another volume (backup going, defragmentation, 
on-line maintenance or something else, I'm not a specialist here). 

 Even with a dedicated SAN and 15K/rpm drives, MS generally recommend
 Raid-10 configurations for exchange. Raid-5/6, or RaidZ1/2 usually
 doesn't give the IOP/s rates you need - although many people do anyway.br

That's why I'd like to use SSD - to improve iops to a desirable level.

 How many users are there on your exchange server? I have a suspicion
 that even if you move to zfs mirroring, you still might not get enough
 performance for exchange with that number of drives -
 especially if you're putting other load on the system.br

Yes, there is other load and this is a problem as well (archiving in our 
case). 
That's why they want to separate exchange dbs from each other.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Restricting initiator resource consumption

2009-08-18 Thread Roman Naumenko
Using mirrors just makes zfs useless. The whole idea is a reliable raid6 
storage with snapshots features.
The question is how prevent saturation for one volume.

By the way, how this is designed in a large disk arrays, where are hundreds 
luns are accessed simultaniosly?

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Windows over iscsi accesses snv_101b: freezes completely

2009-08-18 Thread Roman Naumenko
Seems like the issue was indeed with TSO and bad windows driver. 
They updated it, disabled TSO - no complaints any more.

Thank you for helping!

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Slog: how to make it work?

2009-08-18 Thread Roman Naumenko
This is comstar+itadm configuration, so iscsitgt is not used. There is no 
driver for storage, it's JBOD on Adaptec raid controller.

And bottom of the problem is: nfs makes async writes always (copying files, 
bonnie++, any other writing activity), I see it right away on iostat for ssd 
device.

Windows initiator doesn't produce any of such writes -  on synthetic tests I've 
tried and with yours sqlio. 

We'll put an exchange db on that volume and see the difference, but for me it's 
quite strange situation.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Restricting initiator resource consumption

2009-08-18 Thread Roman Naumenko
 Using mirrors just makes zfs useless. The whole idea
 is a reliable raid6 storage with snapshots
 features.br
 The question is how prevent saturation for one
 volume./blockquotediv
brWhat is your current
 zpool format (raidz, raidz2, etc)? Using a mirror
 does not make zfs useless - you can still use all of
 the built-in features of the software. Mirroring your
 drives just makes it a raid1 instead of raid6.br

raidz2 array of 8 disks, 2Xquade core, 16G

Yes, it's possible to configure it as 4x2 mirrors with capacity almost 2 time 
less, with reliability also degraded. But since they on the same controller, 
performance of one pool might be dependent on access to another.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Restricting initiator resource consumption

2009-08-18 Thread Roman Naumenko
 On Aug 18, 2009, at 10:47 AM, Roman Naumenko no- 
 re...@opensolaris.org wrote:
 
  Using mirrors just makes zfs useless. The whole
 idea
  is a reliable raid6 storage with snapshots
  features.br
  The question is how prevent saturation for one
  volume./blockquotediv
  brWhat is your current
  zpool format (raidz, raidz2, etc)? Using a mirror
  does not make zfs useless - you can still use all
 of
  the built-in features of the software. Mirroring
 your
  drives just makes it a raid1 instead of raid6.br
 
  raidz2 array of 8 disks, 2Xquade core, 16G
 
  Yes, it's possible to configure it as 4x2 mirrors
 with capacity  
  almost 2 time less, with reliability also degraded.
 But since they  
  on the same controller, performance of one pool
 might be dependent  
  on access to another.
 
 Roman,
 
 That config will not handle exchange db well as it
 will have the max  
 IOPS of a single disk because raidz/raidz2 has to
 write the whole  
 stripe width in each write.
 
 I would seriously re-think your configuration or go
 with a hardware  
 RAID solution.

I would like to have any major SAN in place for exchange: just put it there and 
forget, instead of dancing with zfs. Unfortunately, not within the current 
budget.

But since zfs has more features, better reliability than typical vendor's 
NAS/SAN appliance and it free, we are going with it.

Well, performance, yes. I'm personally just waiting for the slog bug to be fixed
IBM already did a great job bringing 250$ SSD on the market.
 
--
Roman

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Getting to the bottom of poor ZFS read performance

2009-08-17 Thread Roman Naumenko
I'd like to confirm that irq overlapping might be the issue, at least it was a 
case for FreeBSD kernel. 
We had a bad performance once on a freebsd based firewall. And it turned out 
that embedded broadcom NICs and NICs on PCI shared the same irq.
After reassigning them, processor load decreased almost 2 times with much 
better throughput.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] Slog: how to make it work?

2009-08-17 Thread Roman Naumenko
I have a strange filing that attached slog device doesn't do anything on zpool.
zsan0store   203G  10.7T 47 23  5.85M  2.49M
  raidz2 203G  10.7T 47 22  5.85M  2.44M
c7t0d0  -  - 25  5   830K   418K
c7t1d0  -  - 25  5   830K   418K
c7t2d0  -  - 25  5   829K   418K
c7t3d0  -  - 25  5   829K   418K
c7t4d0  -  - 25  5   830K   418K
c7t5d0  -  - 25  5   830K   418K
c7t6d0  -  - 25  5   830K   418K
c7t7d0  -  - 25  5   830K   418K

...  c9d0   128K  29.7G  0  1  2   229K
...  c9d0   128K  29.7G  0  0  0  79.1K
...  c9d0   128K  29.7G  0  0  0  0

format -e
. c9d0 FiD 2.5-90429AAB-0001-29.84GB
  /p...@0,0/pci-...@1f,2/i...@0/c...@0,0

As a back-end storage I use raw volume created on a raidz2 pool with Comstar 
iscsi. Initiator is a standard windows iscsi. Speed and latency is tolerable on 
118b, especially with compression=on and 128k bsize on zfs.

The question is about zfs slog device and how to check if it can improve access.

Does zfs cache raw volume data on slog device or only on a filesystem? 
What kind of tests I can run to see it works?

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Slog: how to make it work?

2009-08-17 Thread Roman Naumenko

Ross Walker wrote, On 42-12-23 02:59 PM:


On Aug 17, 2009, at 12:24 PM, Roman Naumenko no-
re...@opensolaris.org wrote:

 The question is about zfs slog device and how to check if it can 
 improve access.


 Does zfs cache raw volume data on slog device or only on a filesystem?
 What kind of tests I can run to see it works?

The slog will cache all synchronous writes on the zpool whether they 
be zfs or zvol writes.


While running a synchronous write look at the output of 'iostat -x 1' 
locate the 'sd' representing the slog device and look at it's io.


-Ross

I believe windows doesn't do sync since zpool iostat shows only 30-sec 
bursts and iostat for sd is zero all the time.

Any thoughts how to make it dong sync writes?

Will slog be helpful for typical windows appliance such as Exchange?

--
Roman
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] Restricting initiator resource consumption

2009-08-17 Thread Roman Naumenko
I have complaints from potential zfs storage users who blames SUN about not 
being capable to manage it's resources (LUNs).
Let me explain what they mean by citing 3 examples:

I started formatted the second drive and it killed all performance.
Exchange is doing {online defragmentation, segmentation, } - other volumes 
suffer
SQL server is doing something on db volume - other volumes becomes slow.

What what you advise me? Is there is a point in such requirement? 
Is Fishwork, for example, capable to provide such management?

From my point of view only on the network level it can be archived for iscsi 
protocol.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Slog: how to make it work?

2009-08-17 Thread Roman Naumenko
Ok, thanks, I've seen original before...

The problem know is that windows doesn't create at some point any sync 
activity. NFS, on the other hand create huge. NFS performace is just so poor, 
15MB/s - and this is with ssd drive

--
Roman
.
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] itadm shows active session, win initiator doesn't find new disk

2009-08-16 Thread Roman Naumenko
Resplved. thanks.
view wasn't correctly defined.
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] itadm shows active session, win initiator doesn't find new disk

2009-08-14 Thread Roman Naumenko
Hi Jim,

Sure, I configured lu and added a view as described in the link.

stmfadm list-lu -v
LU Name: 600144F0CC12CC004A8600DA0001
Operational Status: Online
Provider Name : sbd
Alias : lu-vol1
View Entry Count  : 1
Data File : /dev/zvol/rdsk/zsan0store/zsan0vol0
Meta File : not set
Size  : 1099511627776
Block Size: 512
Vendor ID : SUN 
Product ID: COMSTAR 
Serial Num: not set
Write Protect : Disabled
Writeback Cache   : Enabled

What is the Block Size: 512 ? 
This is not a zfs block size definitely which I set 128k when created volume.  



#stmfadm list-target -v
Target: iqn.1986-03.com.sun:02:vol1
Operational Status: Online
Provider Name : iscsit
Alias : -
Sessions  : 1
Initiator: iqn.1991-05.com.microsoft:bla-bla-bla
Alias: -
Logged in since: Fri Aug 14 21:12:46 2009



~# stmfadm list-hg -v
Host Group: zsan0server-hg



# stmfadm list-tg -v
Target Group: win1-tg
Member: iqn.1991-05.com.microsoft:bla-bla-bla

Probably I should configure view allow all and then add restrictions.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Windows over iscsi accesses snv_101b: freezes completely

2009-08-13 Thread Roman Naumenko
Thanks, guys, for looking into it. You did excellent diagnostic without looking 
on actual server (just like Dr. House does with it patients :)

It was indeed tso option which causes 64k packets on traces, after disabling it 
- large packets are no any more there. Fanny, it doesn't appear if enable tso 
again - probably win server wants restart.

We are now looking into win nic drives - if it's updated and if there are any 
bugs related to it.

I'll keep posting results.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] itadm shows active session, win initiator doesn't find new disk

2009-08-13 Thread Roman Naumenko
# itadm list-target
TARGET NAME  STATESESSIONS 
iqn.1986-03.com.sun:02:vol1  online   1  

Windows initiator has this target connected, no issues. But a new disk doesn't 
appear in management console.

What should I check?

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Windows over iscsi accesses snv_101b: freezes completely

2009-08-12 Thread Roman Naumenko
Thanks for looking into this.

You are right noticing unusual jambo frames on win nic. 
Why it's generated with setting disabling jambo - nobody I believe will explain.
Freaking M$...
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Windows over iscsi accesses snv_101b: freezes completely

2009-08-12 Thread Roman Naumenko
I'll post later capture file from solaris. There were no jumbo.

And I checked another win server - it generates huge amount of 50-60k packets 
(targets are on 111b).

Maybe this is a feature? Can somebody take a look on own win box and check 
jumbo presence (while it's not enabled on nic)?

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] svn118: lib/svc/method/dcoeconfig not found

2009-07-24 Thread Roman Naumenko

Thank you, Jonathan.

--
Roman

Jonathan Edwards wrote, On 09-07-23 04:54 PM:


http://opensolaris.org/jive/thread.jspa?threadID=107881

On Jul 23, 2009, at 4:40 PM, Roman Naumenko wrote:

 I upgraded box to 118 and now it's messed up.
 System goes to maintenance due to:
 system/devices/fc-fabric:default is in maintenance

 In logs:
 lib/svc/method/fcoeconfig not found

 46 dependent services are not running.

 How did this fcoe manag to mess all things up?
 I even didn't plan to use fc at this moment.

 --
 Roman
 --
 This message posted from opensolaris.org
 ___
 storage-discuss mailing list
 storage-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/storage-discuss

___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] svn118: lib/svc/method/dcoeconfig not found

2009-07-23 Thread Roman Naumenko
I upgraded box to 118 and now it's messed up.
System goes to maintenance due to:
system/devices/fc-fabric:default is in maintenance

In logs:
lib/svc/method/fcoeconfig not found

46 dependent services are not running. 

How did this fcoe manag to mess all things up? 
I even didn't plan to use fc at this moment.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] R: Re: Limiting iSCSI logon for targets

2009-07-08 Thread Roman Naumenko
Title: Re: [storage-discuss] Limiting iSCSI logon for targets




Thanks, but we've already used Openfiler.
This time I'd like to build storage now without Linux involving :)

Roman

Anzuoni Guido wrote, On 09-07-08 04:25 AM:

  
  
  

  
  AFAIK, latest release of NexentaStor is built on top of
COMSTAR.
  
  Guido
  
  
  
  Da:
Roman Naumenko [mailto:ro...@frontline.ca] 
  Inviato: marted 7 luglio 2009 19.06
  A: Jim Dunham
  Cc: Anzuoni Guido; storage-discuss@opensolaris.org
  Oggetto: Re: Re: [storage-discuss] Limiting iSCSI logon for
targets
  
  
  
  Jim Dunham wrote, On 42-12-23 02:59 PM: 
  Guido,
  
 I am experimenting COMSTAR as iSCSI storage solution and I haven't
 found a way to limit the number of sessions for a specific iSCSI
 target.
 What I am trying to achieve is a configuration where only one
 initiator at time can logon to a specific target.
 While an initiator is connected others logon should be denied.
 Is it a missing feature ?
  
iSCSI in COMSTAR is a block based protocol (not filesystem or
database). At this level of LUN access, the means to control single
use of a LUN is through Persistent Reserve, a pair of SCSI commands.
  
 5E 
PERSISTENT RESERVE IN
 5F 
PERSISTENT RESERVE OUT
  
Access to these SCSI commands are typically not for the end-user, and
are usually implemented by a volume manager or clustering software.
There is a opensource utility called sg_persist that provides access
to this: http://opensolaris.org/jive/thread.jspa?threadID=88835
  
 Is there a way to do it that I haven't seen ?
  
Specifics regarding your iSCSI Initiator, the type of operating
system, filesystem or database type, may allow other options to be
considered.
  
For example, If the filesystem type on the iSCSI LUN is ZFS, the
command "zpool import " provides a warning that another node
"may"
be accessing the ZFS filesystem on the iSCSI LUN. Unfortunately if the
other node looses connectivity to the iSCSI LUN, or does not "zpool
export ..." the filesystem, the warning will still be present. Also
there is a "force" option to "zpool import ...", an option
that does
no assure exclusive access.
  
- Jim
  By the way, what's the situation with
web-interfaces for
storage appliances?
On the top there is a fishworks appliance, but it comes only with
hardware as
far as I know.
  
There is a simple interface for zfs - smcwebserver
Is there anything for comstar available?
  
--
Roman Naumenko
  ro...@frontline.ca
  



___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Sol 10u7 iscsitgt write performance

2009-07-08 Thread Roman Naumenko




Ross Walker wrote, On 42-12-23 02:59 PM:

  
  
  [storage-discuss] Sol 10u7 iscsitgt write performance
Ok, I have just about given up
on this one, and I was so hopeful after 
  having figured out the read performance issue
(which to recap was a 
mix of ESX guest network latency and a randomizing of the data file 
from running random write and sequential read tests back to back).
  
Now it seems I am up against the hardware and ZIL on this one.
  
To make it short, 4k sequential writes,  throughput 16MB/s...
  
Local tests show hardware is capable of at least 132MB/s (3 raidz 
groups of 4 disks each, those are 15K SAS disks).
  
I believe the network is tuned optimally, Nagle on Windows disabled.
  
The ZIL is going to a 16GB Mtron SSD, 100us sequential access time.
  
Any more suggestions or pointers to improve things are warmly welcomed.
  
-Ross
  

Hi Ross,

Can you give me more details about your ssd? 
I'm looking for SSD to test it. Is yours expensive? Fast?

Did you check it with dd command and iostat? 

Thanks,
Roman



___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] JBOD mode on Adaptec controller: disks are not recognized

2009-07-08 Thread Roman Naumenko

Hello,

It's Opensolaris 111b on x86.

I'm trying to configure Adaptec 5085 to work in JBOD mode.
It has 8 SATA disk connected. If disks are configured in array mode 
(each disk has 1 array) - Solaris recognizes it on the fly.


r...@zsan0:~# cfgadm -lav
Ap_Id  Receptacle   Occupant Condition  
Information

When Type Busy Phys_Id
c7 connectedconfigured   unknown
unavailable  scsi-bus n
/devices/p...@0,0/pci8086,2...@2/pci8086,3...@0/pci8086,3...@0/pci9005,2...@0:scsi
c7::dsk/c7t0d0 connectedconfigured   unknown
Adaptec RAID 5805
unavailable  disk n
/devices/p...@0,0/pci8086,2...@2/pci8086,3...@0/pci8086,3...@0/pci9005,2...@0:scsi::dsk/c7t0d0
c7::dsk/c7t1d0 connectedconfigured   unknown
Adaptec RAID 5805
unavailable  disk n
/devices/p...@0,0/pci8086,2...@2/pci8086,3...@0/pci8086,3...@0/pci9005,2...@0:scsi::dsk/c7t1d0


And so on, altogether 8 disks.

After configuring JBOD with 8 disk Solaris can see only controller:

sad...@zsan2:~$ cfgadm -lav
Ap_Id  Receptacle   Occupant Condition  
Information

When Type Busy Phys_Id

c7 connectedunconfigured unknown

unavailable  scsi-bus n
/devices/p...@0,0/pci8086,2...@2/pci8086,3...@0/pci8086,3...@0/pci9005,2...@0:scsi


r...@zsan2:~# cfgadm -c configure  c7
cfgadm: Hardware specific failure: failed to get state for SCSI bus: No 
such device or address


Can it be configured to work with JBOD?

Thanks,
Roman
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Limiting iSCSI logon for targets

2009-07-07 Thread Roman Naumenko




Jim Dunham wrote, On 42-12-23 02:59 PM:

  
  
  Re: [storage-discuss] Limiting iSCSI logon for targets

  Guido,
  
 I am experimenting COMSTAR as iSCSI storage solution and I haven't 
 found a way to limit the number of sessions for a specific iSCSI 
 target.
 What I am trying to achieve is a configuration where only one 
 initiator at time can logon to a specific target.
 While an initiator is connected others logon should be denied.
 Is it a missing feature ?
  
iSCSI in COMSTAR is a block based protocol (not filesystem or 
database). At this level of LUN access, the means to control single 
use of a LUN is through Persistent Reserve, a pair of SCSI commands.
  
    5E  PERSISTENT RESERVE IN
    5F  PERSISTENT RESERVE OUT
  
Access to these SCSI commands are typically not for the end-user, and 
are usually implemented by a volume manager or clustering software. 
There is a opensource utility called sg_persist that provides access 
to this: http://opensolaris.org/jive/thread.jspa?threadID=88835
  
 Is there a way to do it that I haven't seen ?
  
Specifics regarding your iSCSI Initiator, the type of operating 
system, filesystem or database type, may allow other options to be 
considered.
  
For example, If the filesystem type on the iSCSI LUN is ZFS, the 
command "zpool import " provides a warning that another node "may" 
be accessing the ZFS filesystem on the iSCSI LUN. Unfortunately if the 
other node looses connectivity to the iSCSI LUN, or does not "zpool 
export ..." the filesystem, the warning will still be present. Also 
there is a "force" option to "zpool import ...", an option that does 
no assure exclusive access.
  
- Jim
  

By the way, what's the situation with web-interfaces for storage
appliances?
On the top there is a fishworks appliance, but it comes only with
hardware as far as I know.

There is a simple interface for zfs - smcwebserver
Is there anything for comstar available?

--
Roman Naumenko
ro...@frontline.ca


___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Building home-made 7210

2009-07-03 Thread Roman Naumenko




Chris Du wrote, On 42-12-23 02:59 PM:

  Here is filebench I did with and without SSD, compression on and off again 2009.06
Hope it helps.

BTW. You may want to connect your disks to onboard SATA port and see if it helps. My experience shows ZFS doesn't like HW raid controllers with onboard cache.
  

Hi Chris,

Interesting results. Seems like compression can help as much as SSD.
Although both are helpless on small random writes.
I'm afraid that what we need most with our exchange servers which are
kind of M$ DB designed. 

Do you have specification for hardware you used for tests?

May I ask you what kind of storage you use eventually for your
production (if anything with zfs)?

-- 
Best regards,
Roman Naumenko
Network Administrator

ro...@frontline.ca




___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Building home-made 7210

2009-07-03 Thread Roman Naumenko

Chris Du wrote, On 42-12-23 02:59 PM:

Compression helps when you don't export volumes through NFS. SSD really helps 
increase write speed and reduce latency for NFS.

This filebench was run on storage server itself. I haven't got a chance to run 
it  inside client. Inside client, I think the result will be more positive for 
small random IO, if you check iSCSI lun properties, it shows writeback enabled 
by default so storage server memory is used for writeback cache.

I'm still testing the environment, the server is dual Opteron 254, 8G memory, 
dual broadcom 5704 nic, qlogic 2460. Disks are 3*MAXTOR AtLAS 10K5, but later 
I'll make 2*147G for OS mirror, 6*ATLAS 10K5 in RAID10, plus a Supermicro 
storage shelf with 24 SAS disks, 2*Intel x25-e SSD. This environment is for my 
VMware ESX test lab.

Actually, I see very good performance inside VMware guest for small  random I/O.
  
With, or without compression there is something strange with NFS on 
111b, I give up.
Will try to test it on 117, but the performance was so bad with no 
obvious reasons (like James describing above) that now it's hard even 
consider ZFS+NFS for production.


But iscsi or FC still can work out for me.

I'd like to ask you couple more questions: what is the chassis model 
going t get? And how the storage will be connected to your vmware server? 
Do you use any tuning like jumbo frames, ack registry key correction on 
windows?
I also see you have Qlogic FC card, how does it perform? Did you have a 
chance to test comstar with it?


--
Regards,
Roman
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] stmf doesn't start: Unable to open device. Is the driver attached?

2009-07-02 Thread Roman Naumenko
Thanks, it's resolved. 
Clearing the state didn't help, but reboot did.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] Building home-made 7210

2009-07-02 Thread Roman Naumenko
Hi,

Just wonder if there is something special that makes 7120 faster than the 
server you can assemble by yourself from common components. I have an ultimate 
task to make a fast zfs :)

And what I'm thinking about is that there is only thing you can't get from 
nearest server hardware supplier: SDD

I see that STEC SSD ZEUSiops are used in 7120. Other staff is pretty common: 
like 7200 SATA disks, controller probably can't increase speed much since it's 
a standard SATA 300MB/s

Processor and memory are usual, I have for example 2G E5405 quad-core 
processors and  16Gb RAM and 8 1.5T disks on Adapted 5805. Version is 111b
Well, the performance of such system is so-so. 30Mb/s nfs, 50Mb/s iscsi. I'm 
looking for a way to improve it radically. I have in mind FC, SAS drives and 
SSD, going to try all of them.

I'd like to start with SSD but STEC web-site doesn't show prices. Probably they 
are too high. Are there any other SSD supplier to try? IBM has 500$ 18G drives 
www-03.ibm.com/systems/storage/disk/ssd/index.html
but probably they don't optimized for writing.
IBM says that they can make write rate : 47MB/s and random Read (4K blocks): 
5000 IOPS 

I also see that SUN itself sell 32 GB SATA SSD, 1200$
I see the following info in the catalog:
150Mb/s writing, 5000-7000 random write IOPS and 300 μs max command latency. 
Is this one going to improve zfs performance? 

Thanks,
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] stmf doesn't start: Unable to open device. Is the driver attached?

2009-07-02 Thread Roman Naumenko
Hi Brent,

Thanks for this information, I'll use it. 

Actually, what developers think about it's documentation? There are basically 
no documentation for COMSTAR available (I don't take into account messy wiki 
pages, man pages are helpful, but they are not a real documentation either).
Are there plans to create descent one?

--
Roman

ro...@frontline.ca
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] Building home-made 7210

2009-07-02 Thread Roman Naumenko
Great, I'll look into your table.

Regarding raid controller: I disabled cache during creation arrays (for some 
reason JBOD mode is not available for opensolaris on Adaptec 5085)

Actually, it writes very fast, so I can't blame controller:

r...@zsan0:/# dd if=/dev/zero of=/zsan0store/bonnie/test.txt bs=64k 
count=30 

r...@zsan0:/# zpool iostat 30
   capacity operationsbandwidth
pool used  avail   read  write   read  write
zsan0store   718G  10.2T 36  1.54K   166K   176M

And iostat shows up to 280Mb/s on controller during dd.
 extended device statistics 
devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b 
c7   25.0 2695.41.6 [b] 278.3 [/b] 0.0 217.6   80.0   0 766

Message was edited by: rokka
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] 111b: NFS: zfs writes synchronously only

2009-06-30 Thread Roman Naumenko
Hello,

I'm trying to configure nfs on zfs. No special tuning, just set sharenfs=on and 
on the client simple mount:
172.0.7.100:/zsan3store/mailarch2/opt/mailarch2   nfs async,noatime

dd from local machine gives ~40Mb/s Not impressive at all. 
Tweaking with nfs client options  (like setting block sizes) only decreases 
speed. 

And writing small files (mail archive) makes zfs write to the storage 
constantly, no bursts at all. Is it how it should be?
 
Nfs client is Ubuntu 8.04

Regards,
Roman Naumenko

ro...@frontline.ca

Message was edited by: rokka
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] 111b: NFS: zfs writes synchronously only

2009-06-30 Thread Roman Naumenko
Thanks, that's great article.  However, I don't have any ssd to check if it's 
the case. 

What I see that there is no syncro requests. 

trace -n 'nfsv3:::op-write-start { @[args[2]-stable] = count(); }'
dtrace: description 'nfsv3:::op-write-start ' matched 1 probe
^C
0   64

I just wonder why zfs still writes permanently?

Regards,
Roman Naumenko

ro...@frontline.ca
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] stmf doesn't start: Unable to open device. Is the driver attached?

2009-06-30 Thread Roman Naumenko
How does the service start? Where is the svc-stmf config that is requested?

r...@zsan3:~# sbdadm
Unable to open device. Is the driver attached ?

r...@zsan3:~# svcadm enable stmf

r...@zsan3:~# stmfadm list-state
stmfadm: unknown error

r...@zsan3:~# svc -xv
svcadm   svccfg   svcprop  svcs 

r...@zsan3:~# svcs -xv
svc:/system/stmf:default (STMF)
 State: maintenance since July  1, 2009  1:11:35 PM EDT
Reason: Start method failed repeatedly, last exited with status 1.
   See: http://sun.com/msg/SMF-8000-KS
   See: man -M /usr/share/man -s 7D stmf
   See: man -M /usr/share /man -s 1M stmfadm
   See: /var/svc/log/system-stmf:default.log
Impact: This service is not running.

r...@zsan3:~# tail -f /var/svc/log/system-stmf:default.log
[ Jul  1 13:11:35 Enabled. ]
[ Jul  1 13:11:35 Executing start method (/lib/svc/method/svc-stmf start). ]
svc-stmf: unable to load config

Thanks,
Roman Naumenko

ro...@frontline.ca
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] comstar status

2009-06-24 Thread Roman Naumenko
Hello Dan,

Can you point me where to download iso image for development version?
I have on hdd b11x.iso but can't find the source for it.

Thanks,

Roman Naumenko
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] comstar status

2009-06-22 Thread Roman Naumenko




Hi Dan,

Thanks for the information. 

Can you advise what version of opensolaris is better for production
use? Is this only 0906?
Are there any features and fixes that are not available in 0906 but are
presented in newest? Especially regarding iscsi.

--
Roman Naumenko
Network Administrator

ro...@frontline.ca mailto:ro...@frontline.ca

Dan Maslowski wrote, On 12/23/-28158 02:59 PM:

  
  
  Re: [storage-discuss] comstar status

  Roman,
  
You should use COMSTAR iscsi. Development has stopped on iscsitgtd,
COMSTAR is the future You can trust my answer, I own both
varieties...
  
Regards,
Dan
Dan Maslowski
Sr. Engineering Manager
COMSTAR Storage Software
  
Roman Naumenko wrote:
 I have a question about COMSTAR and iscsi service.

 I'm not sure what to use for iscsi: comstar or iscsi package.
There has
 been quite a lot of comstar references for iscsi functionality.

 If iscsi functionality is transferring to COMSTAR and if original
 packages will be abandoned?
 Should I switch to comstar for providing iscsi targets?

 Thank you,

 Roman Naumenko
 *Network Administrator

 *ro...@frontline.ca mailto:ro...@frontline.ca

  
  



___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] comstar status

2009-06-18 Thread Roman Naumenko




I have a question about COMSTAR and iscsi service.

I'm not sure what to use for iscsi: comstar or iscsi package. There has
been quite a lot of comstar references for iscsi functionality.

If iscsi functionality is transferring to COMSTAR and if original
packages will be abandoned? 
Should I switch to comstar for providing iscsi targets?

Thank you,

Roman Naumenko
Network Administrator

ro...@frontline.ca




___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] async_throttle_delay in progress: what exactly it shows?

2009-06-11 Thread Roman Naumenko

 Or is this a total delay for the whole period of replication time?
 This is the total number of times SNDR had to delay replicating a
 chunk of data, since a given replica was last enabled or resumed by
 SNDR. An increment occurs during asynchronous replication with both
 memory or disk queues, at the time when the total number of items, or
 the total size of all items, exceeds what was previously configured as
 the memory or disk queue size.

 For memory queues this is:
 sndradm [opts] -F [set] set maximum fbas to queue
 sndradm [opts] -W [set] set maximum writes to queue

 For disk queues this is:
 The summation of the number of items, and the number of blocks, based
 on the physical size of the associated disk queue.

 It is impossible to keep this number low, by increasing the memory
 queue, disk queue, the number of asynchronous flusher threads
 (sndradm -A ...), higher network bandwidth, lower network latency, a
 faster SNDR secondary node, or some combination of these.

 - Jim

Thanks for explanation, Jim.
Now I see what it means.

--
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] FeedBack on SSD zfs raid perf

2009-06-11 Thread Roman Naumenko
Nobody wants to test since they don't live for a long time :)
I've asked around and people tell that ssd are fast but die fast as well under 
the load. 

--
Roman Naumenko
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] async_throttle_delay in progress: what exactly it shows?

2009-06-10 Thread Roman Naumenko
 Can somebody look into this magic number and explain why 
 async_throttle_delay slowly grows over the time
 and if might be related to the delays?
 The number is not all that magic. Have you looking at the following manual, 
 and then specificlly at for the keyword async_throttle_delay?

 http://docs.sun.com/source/819-6148-10/chap5.html, search for 
 async_throttle_delay

I meant it's interesting why it's growing slowly over the time? 
I thought this is an average value for given perion of observation.
 
Or is this a total delay for the whole period of replication time?

--
Regards,
Roman Naumenko
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] async_throttle_delay in progress: what exactly it shows?

2009-06-09 Thread Roman Naumenko




Hello,

We've set up a local replication between two zsan storages over direct
1 Gb connection

/dev/rdsk/c3t0d0s0  -  zsanback.local:/dev/rdsk/c3t0d0s0
{6 more}
/dev/rdsk/c3t7d0s0  -  zsanback.local:/dev/rdsk/c3t7d0s0

There is a zfs filesystem being accessed with iscsi. Recently users
started to complain about delays in accessing files.

Can somebody look into this magic number and explain why
async_throttle_delay slowly grows over the time 
and if might be related to the delays?

#kstat sndr:1:setinfo 15 | grep async_throttle_delay
    async_throttle_delay    17232313
    async_throttle_delay    17235445
    async_throttle_delay    17235445
    async_throttle_delay    17240204
    async_throttle_delay    17240204
    async_throttle_delay    17245600
    async_throttle_delay    17245600
    async_throttle_delay    17251474
    async_throttle_delay    17251474
    async_throttle_delay    17257441

# kstat sndr:1:setinfo
module: sndr    instance: 1    
name:   setinfo class:    storedge
    async_block_hwm 21069
    async_item_hwm  2439
    async_queue_blocks  17982
    async_queue_items   215
    async_queue_type    memory
    async_throttle_delay    17271404
    autosync            0
    bitmap              /dev/md/rdsk/bmp1
    bitsset                 332
    bmpflags                0
    bmp_size            5713920
    crtime                  1301557.77839719
    disk_status                 0
    flags               6150
    if_down         0
    if_rpc_version          7
    maxqfbas            16384
    maxqitems           4096
    primary_host            mainzsan.local
    primary_vol         /dev/rdsk/c3t1d0s0
    secondary_host   zsanback.local
    secondary_vol    /dev/rdsk/c3t1d0s0
    snaptime             2247328.51220685
    syncflags        0
    syncpos          2925489887
    type_flag        5
    volsize          2925489887

About this values:
maxqfbas  16384
maxqitems    4096

If I setup them with higher values then I will see increase in
async_block_hwm  and  async_item_hwm  respectively. 
Does it make sense to change them with 1G local connection?

Typical load is about 5Mb/s reading/writing and sometimes it goes up to
40Mb/s 
Right now I can't see relationship between zpool I/O load spikes and
access delays. 

-- 
Best regards,
Roman Naumenko
Network Administrator

ro...@frontline.ca


___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] motherboard for storage server

2009-06-02 Thread Roman Naumenko




I'm looking for a motherboard: there is a good one,
Intel S5000VSA. We've been using it and except for updating firmware
for 4core cpu there were no issues. It supports 2 CPU and 16G memory.
Does it make sense to go with motherboards that support 32G of
memory? Can it improves zfs performance to a significant degree?

-- 
Best regards,
Roman Naumenko

ro...@frontline.ca




___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] motherboard for storage server

2009-06-02 Thread Roman Naumenko
Title: Re: [storage-discuss] motherboard for storage server




Eric D. Mudama wrote, On 06/02/2009 02:22 PM:

  
  
  

  On Tue, Jun 2 at 12:46, Bob Friesenhahn wrote:
 On Tue, 2 Jun 2009, Roman Naumenko wrote:

 Does it make sense to go with motherboards that support 32G of
memory?
 Can it improves zfs performance to a significant degree?

 The improvement depends on how heavily the server is accessed and
the
 total amount of data which is accessed frequently. With
sufficient RAM,
 the disks will only be accessed for writes and writes will also be
faster
 if the data being updated is cached. If you need performance and
can
 afford it, then go for the 32GB of RAM.
  
The RAM will be the best performer, but for the cost and power, it
might be a better choice to stick with 8-16GB RAM and add an SSD or
two as a cache device.
  
Each 4GB FB-DIMM for that motherboard will run you about $100, so a
32GB cache SSD has about the same cost up front as 16GB of RAM. Each
FB-DIMM burns about 10W of power, compared to 2.5W for the entire SSD
when active, and virtually zero when not being accessed.
  

Well, SDD is interesting thing. 

Of course plus-minus 10W or even hundreds W doesn't make any difference
for us, we are not so environmentally friendly here, in Canada as they
try to persuade everybody :) But that the different story.
Regarding SSD. You pointed interesting. And it's particularly
interesting since AVS bitmaps are candidates to be placed on ssd as Jim
suggested recently.

Well, if I want to try SSD, where should I look? Particular
manufactures? Special motherboards? Can you recommend something?
Unfortunately never had experience. And making fast storage appliances
is essential (since reliability is excellent on zfs).

  A single cache SSD should be able to easily
saturate gigabit ethernet
even in random workloads if that is your performance bottleneck.
  

Well, it's a very good speed but I'm afraid with zfs+iscsi it won't be
so fast.


-- 
Best regards,
Roman Naumenko
Network Administrator

ro...@frontline.ca
25 Adelaide Street East | Suite 600 | Toronto, Ontario
| M5C 3A1
Helpdesk: (416) 637-3132
www.frontline.ca




___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] sndradm - add volume (zfs spare) to existent group

2009-05-31 Thread Roman Naumenko
Ok, thanks. That's what I was thinking about. I just wan't sure if it's ok to 
add volumes into existent groups from AVS point of view 

Only one question about commands you suggested. Is this about adding a disk 
queue?
sndradm -g -q a :

Is this kind a compulsory? 
I mean I don't use disk queues for this particular installation.

And let me ask you Jim about the  famous post AVS and ZFS, seamless. It was 
yours, right? Even if not, would you mind commenting it?

The first question is about the issue with RDC timeouts and disks queues that I 
asked some times ago.
http://www.mail-archive.com/storage-discuss@opensolaris.org/msg05497.html
I wonder if that configuration described  in blog was successful? How did the 
link look like? Was it something like dedicated circuit with usual latency? How 
did it work out eventually?

And the second question about slices for bitmap. What was the reason for 
putting it not on dedicated pair of disk as documentation suggests? 
(avs administration guide: raw devices must be stored on a disk separate from 
the disk that contains the data from the replicated volumes The bitmap must 
not be stored on the same disk as replicated volumes)

Roman Naumenko
Frontline Technologies
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


Re: [storage-discuss] AVS - SNDR: Recovery bitmaps not allocated

2009-04-24 Thread Roman Naumenko

Thanks for your help!

Actually I have some more questions. I need to make a decision on 
replication mode for our storages: zfs send-receive, avs or even 
microsoft internal tool on the iscsi volumes with independent zfs 
snashots on both side.
Initially avs seemed to me a good options, but I can't make it  working 
on 100Mb link with 8x1.36Tb volumes



Roman,


 A weird issue:
 1. avs works for connections on a local switch via local freebsd 
 router connected to the switch

  host1 - switch - router freebsd - switch - host2
 2.  When trying to emulate replication using far distance remote 
 connection with the freebsd router on the remote side then AVS fails 
 with the error:

 [b]sndradm: warning: SNDR: Recovery bitmaps not allocated
 [/b]

First of all, what version of AVS / OpenSolaris are you running? The 
reason I ask, is that this error message being returned from 
sndradm, was a problem partially resolved for AVS on Solaris 10, or 
AVS bundled with OpenSolaris.



# sndradm -v
Remote Mirror version 11.11
# uname -a
SunOS tor.flt 5.11 snv_101b i86pc i386 i86pc Solaris


The specific issue at hand, is that during the first stages of an 
sndradm -u ..., update command, the SNDR secondary node is asked to 
send its entire bitmap to the SNDR primary node. The operation is done 
via a Solaris RPC call, an operation which has an associated timeout 
value. If the amount of time it takes to send this data over the 
network from the secondary node to primary node, exceeds the RPC 
timeout value, the operation fails with a Recovery bitmaps not 
allocated.


It's strange that sndr sends entire bitmap - what if one is for a big 
replicated volumes like for 1.36Gb? It's more that 10 blocks for 
async replication, .

There will be constant timeouts on average 100M link in this case.


SNDR does not replicate the bitmap volume, just the bitmap itself. 
 There is one bit per 32KB of primary volume size, with 8 bits per 
byte, and 512 bytes per block. The answer for 1.36GB is just 11.04 
blocks, or 5.5KB.


But the dsbitmap shows 100441 blocks for async replication, I'm missing 
something?


Required bitmap volume size:
 Sync replication: 11161 blocks
 Async replication with memory queue: 11161 blocks
 Async replication with disk queue: 100441 blocks
 Async replication with disk queue and 32 bit refcount: 368281 blocks

Good. I kind of figure that this was the problem. What are is your 
SNDR primary volume size?
After initial sync started to work (although it's a very slow process 
and it takes 10-15 mins to complete) I have the following situation:


1. Storage (8x1.36Tb in one raidz2 pool):
r...@tor.flt# sndradm -i

tor2.flt2 /dev/rdsk/c3t0d0s0 /dev/md/rdsk/bmp0 mtl2.flt2 
/dev/rdsk/c3t0d0s0 /dev/md/rdsk/bmp0 ip async g zfs-pool
tor2.flt2 /dev/rdsk/c3t1d0s0 /dev/md/rdsk/bmp1 mtl2.flt2 
/dev/rdsk/c3t1d0s0 /dev/md/rdsk/bmp1 ip async g zfs-pool
tor2.flt2 /dev/rdsk/c3t2d0s0 /dev/md/rdsk/bmp2 mtl2.flt2 
/dev/rdsk/c3t2d0s0 /dev/md/rdsk/bmp2 ip async g zfs-pool
tor2.flt2 /dev/rdsk/c3t3d0s0 /dev/md/rdsk/bmp3 mtl2.flt2 
/dev/rdsk/c3t3d0s0 /dev/md/rdsk/bmp3 ip async g zfs-pool
tor2.flt2 /dev/rdsk/c3t4d0s0 /dev/md/rdsk/bmp4 mtl2.flt2 
/dev/rdsk/c3t4d0s0 /dev/md/rdsk/bmp4 ip async g zfs-pool
tor2.flt2 /dev/rdsk/c3t5d0s0 /dev/md/rdsk/bmp5 mtl2.flt2 
/dev/rdsk/c3t5d0s0 /dev/md/rdsk/bmp5 ip async g zfs-pool
tor2.flt2 /dev/rdsk/c3t6d0s0 /dev/md/rdsk/bmp6 mtl2.flt2 
/dev/rdsk/c3t6d0s0 /dev/md/rdsk/bmp6 ip async g zfs-pool
tor2.flt2 /dev/rdsk/c3t7d0s0 /dev/md/rdsk/bmp7 mtl2.flt2 
/dev/rdsk/c3t7d0s0 /dev/md/rdsk/bmp7 ip async g zfs-pool
tor2.flt2 /dev/rdsk/c4t1d0s0 /dev/md/rdsk/bmp8 mtl2.flt2 
/dev/rdsk/c4t1d0s0 /dev/md/rdsk/bmp8 ip async g zfs-pool


2. Bitmaps are on the mirrored metadevice, they are bigger than you 
mentioned but this is what dsbitmap shows for volumes:


bmp0: Soft Partition
   Device: d100
   State: Okay
   Size: 100441 blocks (49 MB)
   Extent  Start Block  Block count
0   34   100441

3. Network:
tor2.flt2 -- freebsd router  ---mtl2.flt2

4. Latency:

r...@tor.flt:# ping -s mtl2.flt2
PING 172.0.5.10: 56 data bytes
64 bytes from mtl2.flt2 (172.0.5.10): icmp_seq=0. time=16.822 ms

I'm emulating on Freebsd the actual delay and the speed for the real 
circuit which is 100Mb and 16ms.


5. The queue during writes with the speed 40Mbite/s on the main host:

r...@tor.flt:/# kstat sndr::setinfo | grep async_block_hwm
   async_block_hwm 1402834
   async_block_hwm 1402834
   async_block_hwm 1402834
   async_block_hwm 1402834
   async_block_hwm 1402834
   async_block_hwm 1402834
   async_block_hwm 1402834
   async_block_hwm 1402834
   async_block_hwm 1402834

The problems:

1.
In replication mode data transmission on freebsd is only 2.5 Mbite/s for 
rcp traffic which is quite lower the numbers netio 

Re: [storage-discuss] AVS - SNDR: Recovery bitmaps not allocated

2009-04-24 Thread Roman Naumenko

Jim Dunham wrote, On 04/24/2009 06:46 PM:

Roman,

Thanks for your help!

Actually I have some more questions. I need to make a decision on 
replication mode for our storages: zfs send-receive, avs or even 
microsoft internal tool on the iscsi volumes with independent zfs 
snashots on both side.
Initially avs seemed to me a good options, but I can't make it  
working on 100Mb link with 8x1.36Tb volumes


It's strange that sndr sends entire bitmap - what if one is for a 
big replicated volumes like for 1.36Gb? It's more that 10 
blocks for async replication, .

There will be constant timeouts on average 100M link in this case.
SNDR does not replicate the bitmap volume, just the bitmap itself. 
 There is one bit per 32KB of primary volume size, with 8 bits per 
byte, and 512 bytes per block. The answer for 1.36GB is just 11.04 
blocks, or 5.5KB.
Of course looking at the example below, the math for replicating TBs 
verses GBs is 1024 times larger. 

1.36 TB * (1 bit / 32KB) * (8 bytes / bit) * (1 block / 512 bytes) = 
 11161 blocks, which is the value reported below for non-disk queue 
replication with SNDR. 
 But the dsbitmap shows 100441 blocks for async replication, I'm 
missing something?
Yes you did, the words disk queue. When replicating with a disk 
queue, there is an addition requirement for storing 1-byte or 4-byte 
reference counter per bit. These reference counters are separate from 
the actual bitmap, and are not exchanged between SNDR primary and 
secondary nodes.

Required bitmap volume size:
  Sync replication: 11161 blocks
  Async replication with memory queue: 11161 blocks
  Async replication with disk queue: 100441 blocks
  Async replication with disk queue and 32 bit refcount: 368281 blocks
 
Good. I kind of figure that this was the problem. What are is your 
SNDR primary volume size?
After initial sync started to work (although it's a very slow process 
and it takes 10-15 mins to complete) I have the following situation:
Below you made reference to using 'ping' with a result of 64 bytes 
from mtl2.flt2 (172.0.5.10): icmp_seq=0. time=16.822 ms. It would be 
interesting to know the results of ping -s  mtl2.flt2 8192, where 
8192 is the chunk size for exchanging bitmaps.

The same with this size in both sides.
The reason I mention this is that 64 bytes / 16.822 ms is ~3.7 
KB/sec. With 8 * 11161 blocks * 512 bytes, it would take ~25 minutes 
to exchange bitmaps with the level of link latency and constrained 
bandwidth your are testing with.
It should have at least the same speed as for replication itself: 2.5-3 
Mbite/s
But definitely the latency causes the problem, with low latency it 
initializes very quick.

1. Storage (8x1.36Tb in one raidz2 pool)
r...@tor.flt# sndradm -i

tor2.flt2 /dev/rdsk/c3t0d0s0 /dev/md/rdsk/bmp0 mtl2.flt2 
/dev/rdsk/c3t0d0s0 /dev/md/rdsk/bmp0 ip async g zfs-pool

.
tor2.flt2 /dev/rdsk/c3t7d0s0 /dev/md/rdsk/bmp7 mtl2.flt2 
/dev/rdsk/c3t7d0s0 /dev/md/rdsk/bmp7 ip async g zfs-pool
tor2.flt2 /dev/rdsk/c4t1d0s0 /dev/md/rdsk/bmp8 mtl2.flt2 
/dev/rdsk/c4t1d0s0 /dev/md/rdsk/bmp8 ip async g zfs-pool
When you state the  initial sync  takes 10-15 minutes to complete, 
what did you do to measure this 10-15 minutes?


Do you know that when using I/O consistency groups, one can also mange 
all the replicas in groups with a single -g group command like:


sndradm -g zfs-pool -nu

Yes, I did replication for I/O group everywhere.
2. Bitmaps are on the mirrored metadevice, they are bigger than you 
mentioned but this is what dsbitmap shows for volumes:

Not an issue, SNDR ignores what it does not need.

bmp0: Soft Partition
Device: d100
State: Okay
Size: 100441 blocks (49 MB)
Extent nbsp; Start Blockn bsp; Block count
 034  ;  100441

3. Network:
tor2.flt2 -- freebsd router  ---mtl2.flt2

4. Latency:

r...@tor.flt:# ping -s mtl2.flt2
PING 172.0.5.10: 56 data bytes
64 bytes from mtl2.flt2 (172.0.5.10): icmp_seq=0. time=16.822 ms

I'm emulating on Freebsd the actual delay and the speed for the real 
circuit which is 100Mb and 16ms.

See comments above.

5. The queue during writes with the speed 40Mbite/s on the main host:

r...@tor.flt:/# kstat sndr::setinfo | grep async_block_hwm
async_block_hwm  nbs p;  1402834
.
async_block_hwm  nbs p;  1402834

SNDR's memory and disk queues are adjustable. The two commands are:

sndradm [opts] -F maxqfbas [set]  set maximum fbas 
(blocks) to queue
sndradm [opts] -W maxwrites [set] set maximum writes 
(items) to queue


These commands set the high-water marks for both number of blocks and 
number of items in the memory queue. These are high-water marks, not 
hard stops, so it is possible for SNDR to exceed these values base on 
current in progress I/Os.


maxqfbas2500
maxqitemsnbsp;   16384

I see that 

Re: [storage-discuss] AVS - SNDR: Recovery bitmaps not allocated

2009-04-23 Thread Roman Naumenko
The main problem with avs is lack of logging information. It responds with 
short messages about failed something and you have no idea where to look and 
what it's related to...
Strange soft really...
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] AVS - SNDR: Recovery bitmaps not allocated

2009-04-22 Thread Roman Naumenko
A weird issue:
1. avs works for connections on a local switch via local freebsd router 
connected to the switch 
  host1 - switch - router freebsd - switch - host2 
2.  When trying to emulate replication using far distance remote connection 
with the freebsd router on the remote side then AVS fails with the error: 
[b]sndradm: warning: SNDR: Recovery bitmaps not allocated
[/b]

Full replication nevertheless works in this case, so there are absolutely no 
problems with network I assume. 

I tried to trace start of sndradm with truss, because there is no obvious 
reason for me why it fails. Network ok, name resolution is ok, rpc responds.

What I can see is only the following difference in traces for the sndradm -nE 
command when replication locally and remotely:

getpid()= 3245 [3244]  | getpid()   
= 3315 [3314]
fcntl(5, F_SETLKW, 0x08046608)= 0  | fcntl(5, F_SETLKW, 
0x08046608)  (sleeping...)
lseek(5, 0, SEEK_SET)   = 0   | Stopped by 
signal #24, SIGTSTP, in fcntl()
read(5,  I G A M, 4)= 4   |   Received signal 
#25, SIGCONT, in fcntl() [default]
lseek(5, 0, SEEK_SET)   = 0 |   siginfo: 
SIGCONT pid=2426 uid=0
read(5,  I G A M\f\0\0\0 f #EF I.., 148)  = 148   |[b] fcntl(5, F_SETLKW, 
0x08046608) [/b] (sleeping...)
read(5,  C : s c m . t h r e a d.., 16384)= 16384  
read(5,1 2 8   6 4   - -.., 36)   = 36 
lseek(5, 2097116, SEEK_CUR)= 2113684 
read(5,1 2 8   6 4   - -.., 36)   = 36 
lseek(5, 2097116, SEEK_CUR)= 4210836 
read(5, 01\0\0\01D\0\0\001\0\0\0.., 524288)   = 524288 


 fcntl(5, F_SETLKW, 0x08046608)  fails? Or is this something else?

Thanks,
Roman
-- 
This message posted from opensolaris.org
___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] subsribe

2009-04-13 Thread Roman Naumenko


--
Best regards,
Roman Naumenko
*Network Administrator

*ro...@frontline.ca mailto:ro...@frontline.ca
25 Adelaide Street East | Suite 600 | Toronto, Ontario | M5C 3A1
Helpdesk: (416) 637-3132
www.frontline.ca http://www.frontline.ca

___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss


[storage-discuss] storageTek on opensolaris: err 26 RPC: Couldn't make connection

2009-04-02 Thread Roman Naumenko




Hello,

I'm trying to use opensolaris and zfs to create replicated storage on
x86 platform. 

Initially I wanted to use send/recv command to replicate data but it
appeared there is no descent scripts.
And there are no resources to write our own, anyway it gonna be as
primitive as other's I found. 

So, the solution is StorageTek. And the problem is StorageTek too,
especially on Opensolaris. I run into know issue with installation AVS
packages by predefine order 
http://defect.opensolaris.org/bz/show_bug.cgi?id=5115
http://www.opensolaris.org/jive/thread.jspa?messageID=307817#307817

In the beginng the rpc connection couldn't go up:

serv2# rpcinfo -p serv1
rpcinfo: can't contact portmapper: RPC: Authentication error; why = Failed (unspecified error)

After googling a bit I changed 
config/local_only boolean false for network/rpc/bind:default

Host can see each other but the rdc service still doesn't work

The error I'm getting is:
NOTICE: SNDR client: err 26 RPC: Couldn't make
connection

Apr  3 03:10:15 serv1 pseudo: [ID 129642 kern.info] pseudo-device: rdc0
Apr  3 03:10:15 serv1 genunix: [ID 936769 kern.info] rdc0 is
/pseudo/r...@0
Apr  3 04:27:36 serv1 rdc: [ID 517869 kern.info] @(#) rdc: built
20:59:41 Oct  1 2008
Apr  3 04:27:36 serv1 pseudo: [ID 129642 kern.info] pseudo-device: rdc0
Apr  3 04:27:36 serv1 genunix: [ID 936769 kern.info] rdc0 is
/pseudo/r...@0
Apr  3 04:27:51 serv1 pseudo: [ID 129642 kern.info] pseudo-device: sdbc0
Apr  3 04:27:51 serv1 genunix: [ID 936769 kern.info] sdbc0 is
/pseudo/s...@0
Apr  3 04:27:53 serv1 sv: [ID 173014 kern.info] sv: rdev 0x320440,
nblocks 2925489887
Apr  3 04:27:53 serv1 sv: [ID 173014 kern.info] sv: rdev 0x3201c1,
nblocks 200882
Apr  3 04:27:53 serv1 pseudo: [ID 129642 kern.info] pseudo-device: ii0
Apr  3 04:27:53 serv1 genunix: [ID 936769 kern.info] ii0 is /pseudo/i...@0
Apr  3 04:27:57 serv1 sv: [ID 173014 kern.info] sv: rdev 0x3201c0,
nblocks 0
Apr  3 04:28:33 serv1 rdc: [ID 643393 kern.notice] NOTICE: SNDR:
Interface 0::100 == 0: : Up
Apr  3 04:28:42 serv1 rdc: [ID 153032 kern.notice] NOTICE: SNDR client:
err 26 RPC: Couldn't make connection
Apr  3 04:28:52 serv1 last message repeated 1 time
Apr  3 04:28:53 serv1 rdc: [ID 643393 kern.notice] NOTICE: SNDR:
Interface 0::100 == 0: : Down

This is the messages I got after all reinstalls for every avs package.
Actually, if installed them as explained in bug report above, then I'm
not able to initialize database by dscfgadm -e, it complains about
unsatisfied dependency for the service nws_scm

After svcadm restart svc:/system/nws_scm:default 
it starts. But everything ends with RPC error.

Servers can see each other:
serv2:

r...@serv2:~# rpcinfo -T tcp serv1 100143
program 100143 version 5 ready and waiting
program 100143 version 6 ready and waiting
program 100143 version 7 ready and waiting
r...@serv2:~# rpcinfo -T tcp serv2 100143
program 100143 version 5 ready and waiting
program 100143 version 6 ready and waiting
program 100143 version 7 ready and waiting

serv1:
r...@serv1:~# rpcinfo -T tcp serv2 100143
program 100143 version 5 ready and waiting
program 100143 version 6 ready and waiting
program 100143 version 7 ready and waiting
r...@serv1:~# rpcinfo -T tcp serv1 100143
program 100143 version 5 ready and waiting
program 100143 version 6 ready and waiting
program 100143 version 7 ready and waiting

What else can I check in this case? Seems like running avs on
opensolaris is pretty bad idea, so I started to download Solaris...

Thanks, 
Roman



___
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss