[zfs-discuss] add/replace : strange zfs pool behaviour
Hi everybody; I'm experimenting something weird on one of my zpool. One of my hard drive failed (c3t3d0). The hot spare (c4t3d0) did its job, I (physically) replaced it, and rebooted. I have acknowledged the failure with fmadm too. I now have this zpool config : $ zpool status storage pool: storage state: ONLINE scrub: resilver completed with 0 errors on Wed Feb 13 16:45:30 2008 config: NAMESTATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 I seethe disk with the format command : $ sudo format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1d0 DEFAULT cyl 60797 alt 2 hd 255 sec 63 /[EMAIL PROTECTED],0/[EMAIL PROTECTED],2/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 1. c2d0 ST350083- 9QG2T9Y-0001-465.76GB /[EMAIL PROTECTED],0/[EMAIL PROTECTED],2/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 2. c3t0d0 Seagate-ST3500830AS-R001-465.76GB /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],3/pci8086,[EMAIL PROTECTED]/pci17d3,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 3. c3t1d0 Seagate-ST3500630AS-R001-465.76GB /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],3/pci8086,[EMAIL PROTECTED]/pci17d3,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 4. c3t2d0 Seagate-ST3500630AS-R001-465.76GB /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],3/pci8086,[EMAIL PROTECTED]/pci17d3,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 5. c3t3d0 Seagate-ST3500630AS-R001 cyl 56524 alt 2 hd 36 sec 480 /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],3/pci8086,[EMAIL PROTECTED]/pci17d3,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 6. c4t2d0 Seagate-ST3500630AS-R001-465.76GB /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],3/pci8086,[EMAIL PROTECTED]/pci17d3,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 7. c4t3d0 Hitachi-HDS725050KLA360-R001-465.76GB /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],3/pci8086,[EMAIL PROTECTED]/pci17d3,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 8. c7t0d0 Seagate-ST3500630AS-R001-465.76GB /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci17d3,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 9. c7t1d0 Seagate-ST3500630AS-R001-465.76GB /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci17d3,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 10. c7t2d0 Seagate-ST3500630AS-R001-465.76GB /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci17d3,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 11. c7t3d0 Seagate-ST3500630AS-R001-465.76GB /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci17d3,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 But I can't insert it into the pool $ sudo zpool replace storage c4t3d0 c3t3d0 invalid vdev specification use '-f' to override the following errors: /dev/dsk/c7t3d0s0 is part of active ZFS pool storage. Please see zpool(1M). $ sudo zpool add storage c4t3d0 c3t3d0 invalid vdev specification use '-f' to override the following errors: /dev/dsk/c7t3d0s0 is part of active ZFS pool storage. Please see zpool(1M). Does anyone have a clue ? thanks. -- Nicolas Szalay Administrateur systèmes réseaux -- _ ASCII ribbon campaign ( ) - against HTML email X vCards / \ signature.asc Description: Ceci est une partie de message numériquement signée ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance Issue
After working with Sanjeev, and putting in a bunch of timing statement throughout the code, it turns out that file writes ARE NOT the bottleneck, as would be assumed. It is actually reading the file into a byte buffer that is the culprit. Specifically, this java command: byteBuffer = file.getChannel().map(mapMode, 0, length); I'm going to try to apply the some of the same things I tried here with troubleshooting the writes to the reads now. If anyone has any different advice, please let me know. Thanks for all the help so far. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS versus VxFS as file system inside Netbackup 6.0 DSSU
Selim, Symantec does support ZFS as DSSU targets. I've also seen a SUN white paper outlining the use of Thumper (Sun X4500) as a NB 6.5 media server, where the best practice was to to configure multiple NB disk storage units to use a distinct ZFS file system. In this case, all the ZFS file systems serving as DSSUs utilized one zpool. Hope this helps, Sri Selim Daoud wrote: unfortunately in this area, Symantec is not helping anyone. they even take their time to officially include zfs in their compatibility lists s- On Jan 16, 2008 1:26 PM, Paul Kraus [EMAIL PROTECTED] wrote: Previous posts from various people: But ... NBU (at least version 6.0) attempts to estimate the size of the backup and make suer there is enough room on the DSSU to handle it. What happens when the free space reported by ZFS isn't really the free space ? Regarding the question asked below namely What happens when the free space reported by ZFS isn't really the free space ?, is there an open bug for this ? As others have said, not a ZFS bug, but a feature :-) Of course, this behavior can be eliminated using ZFS reservations. My comment was regarding the utility of using one ZFS pool to contain MULTIPLE NBU Disk Stage Storage Units ... I just don't see the utility there and I do see a downside. The NetBackup server scans a backup client system, It determines it will need 600gb of disk space on the disk store. It stats the zfs volume and sees there is 700 gb free (enough for the backup) Starts writing 600gb over multiple hours. in the meantime, 500gb is used elsewhere in the pool. NetBackup Fails differently that on vmfs+vxvm in this case? Isn't it NetBackups issue to make sure that it has reserved diskspace or at least checks for space _as_ it writes? If a disk stage fills during a backup (and there is nothing to prevent another application from filling it either) it first triggers DSSU garbage collection to remove the oldest backup images that have already been duplicated to other storage, if that does not succeed in freeing up enough space (and I have seen it trigger GC multiple times), then I have observed two different behaviors (probably related to differing patch versions): 1. backup fails with a 129 error 2. backup is continued on tape media This latter 'solution' ends up creating a unusable backup image as NBU now doesn't know how to deal with an image that crosses storage units, but that is very off topic for this list :-) My question was more toward how NBU will deal with the apparent SIZE of the DSSU on ZFS changing on a frequent basis. -- {1-2-3-4-5-6-7-} Paul Kraus - Sound Designer, Noel Coward's Hay Fever @ Albany Civic Theatre, Feb./Mar. 2008 - Facilities Coordinator, Albacon 2008 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Is gzip planned to be in S10U5?
Hello, Is the gzip compression algorithm planned to be in Solaris 10 Update 5? Thanks in advance, Brad -- The Zone Manager http://TheZoneManager.COM http://opensolaris.org/os/project/zonemgr ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Avoiding performance decrease when pool usage is over 80%
Ralf Ramge schrieb: Thomas Liesner wrote: Does this mean, that if i have a pool of 7TB with one filesystem for all users with a quota of 6TB i'd be alright? Yep. Although I *really* recommend creating individual file systems, e.g. if you have 1,000 users on your server, I'd create 1,000 file systems with a quota of 6 GB each. Easier to handle, more flexible to use, easier to backup, it allows better use of snapshots and it's easier to migrate single users to other servers. Thanks for your recommendation, still this would not meet our needs. All the data in the production pool must be accessible to all users on this system and will be worked on by all users on this system. Hence, one shared fs for all users is perfectly fine. Thanks for all your input, Tom This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 3ware support
On 2/13/08, Tom Buskey [EMAIL PROTECTED] wrote: Are you using the Supermicro in Solaris or OpenSolaris? Which version? 64 bit or 32 bits? I'm asking because I recently went through a number of SCSI cards that are in the HCL as supported, but do not have 64 bit drivers. So they only work in 32 bit mode. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 64bit (amd opteron), both solaris and opensolaris. Solaris 10u4, and opensolaris from b68 onwards. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 3ware support
Are you using the Supermicro in Solaris or OpenSolaris? Which version? 64 bit or 32 bits? I'm asking because I recently went through a number of SCSI cards that are in the HCL as supported, but do not have 64 bit drivers. So they only work in 32 bit mode. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] unexpected ZFS behavior
On 2/5/2008 2:45 PM, Jeremy Kister wrote: 1. What do I have to do (short of replacing the seemingly good disk) to get c3t8d0 back online? I ended up applying patches 124205-05 and 118855-36. things are much better now, but there are still [at least] two issues remaining. with my zpool in good condition, i yanked out c2t2d0 and c3t3d0. see http://mail.opensolaris.org/pipermail/zfs-discuss/2008-February/045646.html for configuration details. after the two spares came online and the array was resilvered, to simulate io load i did 'cp /dev/zero /dbzpool/bigfile', i pushed c2t2d0 back in, and in another terminal typed zpool replace dbzpool c2t2d0. this caused the whole system to lock up-- neither of my terminals were accepting commands, i couldnt establish new ssh sessions (the sockets opened, but couldnt get a shell after entering my passwd), and the postgres database that was running on the machine was not answering queries from remote hosts. after 8 hours of this behavior, i power cycled the v40z. when it came back, zpool status shows: # zpool status pool: dbzpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM dbzpoolONLINE 0 0 0 mirror ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 spare ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c3t15d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 spare ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c2t15d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 c3t8d0 ONLINE 0 0 0 spares c2t15d0 INUSE currently in use c3t15d0 INUSE currently in use errors: No known data errors == question 1: how do i get the spares to go back into 'spare' mode? question 2: why did the system lock up when i issued the zpool replace? -- Jeremy Kister http://jeremy.kister.net./ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 3ware support
Tom Buskey wrote: Are you using the Supermicro in Solaris or OpenSolaris? Which version? 64 bit or 32 bits? I'm asking because I recently went through a number of SCSI cards that are in the HCL as supported, but do not have 64 bit drivers. So they only work in 32 bit mode. Solaris-10/0807 32-bit (pentium-4) Rob++ -- Internet: [EMAIL PROTECTED] __o Life: [EMAIL PROTECTED]_`\,_ (_)/ (_) They couldn't hit an elephant at this distance. -- Major General John Sedgwick ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] RAIDz2 reporting odd (smaller) size
I saw some other people have a similar problem but reports claimed this was 'fixed in release 42' which is many months old, I'm running the latest version. I made a RAIDz2 of 8x500GB which should give me a 3TB pool: zfs list: NAME USED AVAIL REFER MOUNTPOINT pile 269K 2.67T 40.4K /pile What happened to my 330GB? Sam This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Which DTrace provider to use
[EMAIL PROTECTED] said: difference my tweaks are making. Basically, the problem users experience, when the load shoots up are huge latencies. An ls on a non-cached directory, which usually is instantaneous, will take 20, 30, 40 seconds or more. Then when the storage array catches up, things get better. My clients are not happy campers. I know, I know, I should have gone with a JBOD setup, but it's too late for that in this iteration of this server. We we set this up, I had the gear already, and it's not in my budget to get new stuff right now. What kind of array are you seeing this problem with? It sounds very much like our experience here with a 3-yr-old HDS ATA array. When the crunch came here, I didn't know enough dtrace to help, but I threw the following into crontab to run every five minutes (24x7), and it at least collected the info I needed to see what LUN/filesystem was busying things out. Way crude, but effective enough: /bin/ksh -c date mpstat 2 20 iostat -xn 2 20 \ fsstat $(zfs list -H -o mountpoint -t filesystem | egrep '^/') 2 20 \ vmstat 2 20 /var/tmp/iostats.log 21 /dev/null A quick scan using egrep could pull out trouble spots; E.g. the following would identify iostat lines that showed 90-100% busy: egrep '^Sun |^Mon |^Tue |^Wed |^Thu |^Fri |^Sat | 1[0-9][0-9] c6| 9[0-9] c6'\ /var/tmp/iostats.log I know it's not the Dtrace you were asking for, but maybe it'll inspire something more useful than the above shotgun approach. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAIDz2 reporting odd (smaller) size
On Wed, Feb 13, 2008 at 02:48:25PM -0800, Sam wrote: I saw some other people have a similar problem but reports claimed this was 'fixed in release 42' which is many months old, I'm running the latest version. I made a RAIDz2 of 8x500GB which should give me a 3TB pool: How many sectors on these 500GB disks? zfs list: NAME USED AVAIL REFER MOUNTPOINT pile 269K 2.67T 40.4K /pile What do you have for zpool list? 3.00TB ~~ 2.73TiB -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAIDz2 reporting odd (smaller) size
2008/2/13, Sam [EMAIL PROTECTED]: I saw some other people have a similar problem but reports claimed this was 'fixed in release 42' which is many months old, I'm running the latest version. I made a RAIDz2 of 8x500GB which should give me a 3TB pool: Disk manufacturers use ISO units, where 1k is 1000. ZFS uses computer units, where 1k is 1024. So your 500GB is realy 465GB. Check the exact number with format. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Which DTrace provider to use
Way crude, but effective enough: kinda cool, but isn't thats what sar -f /var/adm/sa/sa`date +%d` -A | grep -v , is for? crontab -e sys to start.. for more fun acctadm -e extended -f /var/adm/exacct/proc process Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Which DTrace provider to use
Marion Hakanson wrote: [EMAIL PROTECTED] said: ... I know, I know, I should have gone with a JBOD setup, but it's too late for that in this iteration of this server. We we set this up, I had the gear already, and it's not in my budget to get new stuff right now. What kind of array are you seeing this problem with? It sounds very much like our experience here with a 3-yr-old HDS ATA array. It's not that old. It's a Supermicro system with a 3ware 9650SE-8LP. Open-E iSCSI-R3 DOM module. The system is plenty fast. I can pretty handily pull 120MB/sec from it, and write at over 100MB/sec. It falls apart more on random I/O. The server/initiator side is a T2000 with Solaris 10u4. It never sees over 25% CPU, ever. Oh yeah, and two 1GB network links to the SAN When the crunch came here, I didn't know enough dtrace to help, but I threw the following into crontab to run every five minutes (24x7), and it at least collected the info I needed to see what LUN/filesystem was busying things out. Way crude, but effective enough: /bin/ksh -c date mpstat 2 20 iostat -xn 2 20 \ fsstat $(zfs list -H -o mountpoint -t filesystem | egrep '^/') 2 20 \ vmstat 2 20 /var/tmp/iostats.log 21 /dev/null A quick scan using egrep could pull out trouble spots; E.g. the following would identify iostat lines that showed 90-100% busy: egrep '^Sun |^Mon |^Tue |^Wed |^Thu |^Fri |^Sat | 1[0-9][0-9] c6| 9[0-9] c6'\ /var/tmp/iostats.log yeah, I have some running traditional *stat utilities running. If I capture more than a second at a time, things look good. I was hoping to get a real distribution of service times, to catch the outliers, that don't get absorbed into the average. Hence why I wanted to use dtrace. My opinion is, if when the array got really loaded up, everything slowed down evenly, users wouldn't mind or notice much. But when every 20 or so reads/writes gets delayed my 10s of seconds, the users start to line up at my door. Thanks for the tips. Jon ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAIDz2 reporting odd (smaller) size
zpool list reports 3.67T, df reports 2.71 which is pretty close to 2.73 so I imagine you guys are right in the difference being 465GB vs 500GB for the size of each disc, guess I'll go pick up another pair :) Thanks! Sam This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Which DTrace provider to use
[EMAIL PROTECTED] said: It's not that old. It's a Supermicro system with a 3ware 9650SE-8LP. Open-E iSCSI-R3 DOM module. The system is plenty fast. I can pretty handily pull 120MB/sec from it, and write at over 100MB/sec. It falls apart more on random I/O. The server/initiator side is a T2000 with Solaris 10u4. It never sees over 25% CPU, ever. Oh yeah, and two 1GB network links to the SAN . . . My opinion is, if when the array got really loaded up, everything slowed down evenly, users wouldn't mind or notice much. But when every 20 or so reads/writes gets delayed my 10s of seconds, the users start to line up at my door. Hmm, I have no experience with iSCSI yet. But behavior of our T2000 file/NFS server connected via 2Gbit fiber channel SAN is exactly as you describe when our HDS SATA array gets behind. Access to other ZFS pools remains unaffected, but any access to the busy pool just hangs. Some Oracle apps on NFS clients die due to excessive delays. In our case, this old HDS array's SATA shelves have a very limited queue depth (four per RAID controller) in the back end loop, plus every write is hit with the added overhead of an in-array read-back verification. Maybe your iSCSI situation injects enough latency at higher loads to cause something like our FC queue limitations. Good luck, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Spare Won't Remove
I have a hot spare that was part of my zpool but is no longer connected to the system. I can run the zpool remove command and it returns fine but doesn't seem to do anything. I have tried adding and removing spares that are connected to the system and works properly. Is zpool remove failing because the disk is no longer connected to the system? # zpool remove tank c1d0s4 # zpool status pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1ONLINE 0 0 0 c2d0ONLINE 0 0 0 c2d1ONLINE 0 0 0 c3d0ONLINE 0 0 0 c3d1ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 spares c1d0s4UNAVAIL cannot open errors: No known data errors -- Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Solaris File Server ZFS and Cifs
Hi ; One of my customers is considering a 10 TB NAS box for some windows boxes. Reliability and High performance is mandatory. So I plan to use 2x Clustered Servers + some storage and ZFS and Solaris. Here are my questions 1) Is any body using Clustered Solaris and ZFS for file serving in an active-active configuration. (granularity required between cluster nodes is at share level) 2) Is CIFS support reliable at the moment ? 3) Can we implement Ethernet port failover for Cifs services? 4) Can we implement Ethernet failover for iSCSI services? 5) Is It easy and automatic to failover and failback Cifs shares? Very best regards Mertol http://www.sun.com/ http://www.sun.com/emrkt/sigs/6g_top.gif Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] attachment: image001.gif___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss