Re: [zfs-discuss] ZFS in S10U6 vs openSolaris 05/08
There is something more to consider with SSDs uses as a cache device. why use SATA as the interface? perhaps http://www.tgdaily.com/content/view/34065/135/ would be better? (no experience) cards will start at 80 GB and will scale to 320 and 640 GB next year. By the end of 2008, Fusion io also hopes to roll out a 1.2 TB card. 160 parallel pipelines that can read data at 800 megabytes per second and write at 600 MB/sec 4K blocks and then streaming eight simultaneous 1 GB reads and writes. In that test, the ioDrive clocked in at 100,000 operations per second... beat $30 dollars a GB, ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS in S10U6 vs openSolaris 05/08
Rob Logan wrote: There is something more to consider with SSDs uses as a cache device. why use SATA as the interface? perhaps http://www.tgdaily.com/content/view/34065/135/ would be better? (no experience) cards will start at 80 GB and will scale to 320 and 640 GB next year. By the end of 2008, Fusion io also hopes to roll out a 1.2 TB card. 160 parallel pipelines that can read data at 800 megabytes per second and write at 600 MB/sec 4K blocks and then streaming eight simultaneous 1 GB reads and writes. In that test, the ioDrive clocked in at 100,000 operations per second... beat $30 dollars a GB, ___ The key take-away here is that the SSD guys *could* do all sorts of neat things to optimize for speed, reliability, and cost. They have many more technology options than the spinning rust guys. My advice: don't bet against Moore's law :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] slog devices don't resilver correctly
This past weekend, but holiday was ruined due to a log device replacement gone awry. I posted all about it here: http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html In a nutshell, an resilver of a single log device with itself, due to the fact one can't remove a log device from a pool once defined, cause ZFS to fully resilver but then attach the log device as as stripe to the volume, and no longer as a log device. The subsequent pool failure was exceptionally bad as the volume could no longer be imported and required read-only mounting of the remaining filesystems that I could to recover data. It would appear that log resilvers are broken, at least up to B85. I haven't seen code changes in this space so I presume this is likely an unaddressed problem. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] indiana as nfs server: crash due to zfs
On Mon, May 26, 2008 at 6:10 AM, Gerard Henry [EMAIL PROTECTED] wrote: hello all, i have indiana freshly installed on a sun ultra 20 machine. It only does nfs server. During one night, the kernel had crashed, and i got this messages: May 22 02:18:57 ultra20 unix: [ID 836849 kern.notice] May 22 02:18:57 ultra20 ^Mpanic[cpu0]/thread=ff0003d06c80: May 22 02:18:57 ultra20 genunix: [ID 603766 kern.notice] assertion failed: sm-sm_space == 0 (0x4000 == 0x0), file: ../../common/fs/zfs/space_map.c, line: 315 May 22 02:18:57 ultra20 unix: [ID 10 kern.notice] May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06830 genunix:assfail3+b9 () May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d068e0 zfs:space_map_load+2c2 () May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06920 zfs:metaslab_activate+66 () May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d069e0 zfs:metaslab_group_alloc+24e () May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06ab0 zfs:metaslab_alloc_dva+1da () May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06b50 zfs:metaslab_alloc+82 () May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06ba0 zfs:zio_dva_allocate+62 () Searching on the net, it seems that this kind of error is usual, does it mean that i can't use indiana as a robust nfs server? What actions can i do if i want to investigate? I've seen many people trying to use (most cases successfully) Indiana or some OpenSolaris build for quasi-production NFS or similar servicing. I think if you want robust, go with something that is targeted for robustness for your case, such as NexentaStor (paid or free editions). I come off as a shill for a solution that I use, but it amazes me that people ask for a robust stable-tracking solutions but always track the bleeding edge instead. Nothing wrong with that, and I do the same, but I know that's what its for :) thanks in advance, gerard This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] indiana as nfs server: crash due to zfs
Gerard Henry wrote: hello all, i have indiana freshly installed on a sun ultra 20 machine. It only does nfs server. During one night, the kernel had crashed, and i got this messages: May 22 02:18:57 ultra20 unix: [ID 836849 kern.notice] May 22 02:18:57 ultra20 ^Mpanic[cpu0]/thread=ff0003d06c80: May 22 02:18:57 ultra20 genunix: [ID 603766 kern.notice] assertion failed: sm-sm_space == 0 (0x4000 == 0x0), file: ../../common/fs/zfs/space_map.c, line: 315 May 22 02:18:57 ultra20 unix: [ID 10 kern.notice] May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06830 genunix:assfail3+b9 () May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d068e0 zfs:space_map_load+2c2 () May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06920 zfs:metaslab_activate+66 () May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d069e0 zfs:metaslab_group_alloc+24e () May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06ab0 zfs:metaslab_alloc_dva+1da () May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06b50 zfs:metaslab_alloc+82 () May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06ba0 zfs:zio_dva_allocate+62 () This looks like an instance of bug 6657936 (see http://bugs.opensolaris.org/view_bug.do?bug_id=6657693). It was fixed in build 87, OpenSolaris 2008.05 is based on build 86. Hope OpenSolaris packages will be in sync with SXCE builds pretty soon - see this: http://mail.opensolaris.org/pipermail/indiana-discuss/2008-May/006406.html so you'll be able to update your install and get a fix for this bug. Hth, Victor Searching on the net, it seems that this kind of error is usual, does it mean that i can't use indiana as a robust nfs server? What actions can i do if i want to investigate? thanks in advance, gerard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS in S10U6 vs openSolaris 05/08
On Tue, May 27, 2008 at 12:44 PM, Rob Logan [EMAIL PROTECTED] wrote: There is something more to consider with SSDs uses as a cache device. why use SATA as the interface? perhaps http://www.tgdaily.com/content/view/34065/135/ would be better? (no experience) cards will start at 80 GB and will scale to 320 and 640 GB next year. By the end of 2008, Fusion io also hopes to roll out a 1.2 TB card. 160 parallel pipelines that can read data at 800 megabytes per second and write at 600 MB/sec 4K blocks and then streaming eight simultaneous 1 GB reads and writes. In that test, the ioDrive clocked in at 100,000 operations per second... beat $30 dollars a GB, These could be rather interesting as swap devices. On the face of it, $30/GB is pretty close to the list price of taking a T5240 from 32 GB to 64 GB. However, it is *a lot* less than feeding system-board DIMM slots to workloads that use a lot of RAM but are fairly inactive. As such, a $10k PCIe card may be able to allow a $42k 64 GB T5240 handle 5+ times the number of not-too-busy J2EE instances. If anyone's done any modelling or testing of such an idea, I'd love to hear about it. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog devices don't resilver correctly
Yeah, I noticed this the other day while I was working on an unrelated problem. The basic problem is that log devices are kept within the normal vdev tree, and are only distinguished by a bit indicating that they are log devices (and is the source for a number of other inconsistencies that Pwel has encountered). When doing a replacement, the userland code is responsible for creating the vdev configuration to use for the newly attached vdev. In this case, it doesn't preserve the 'is_log' bit correctly. This should be enforced in the kernel - it doesn't make sense to replace a log device with a non-log device, ever. I have a workspace with some other random ZFS changes, so I'll try to include this as well. FWIW, removing log devices is significantly easier than removing arbitrary devices, since there is no data to migrate (after the current txg is synced). At one point there were plans to do this as a separate piece of work (since the vdev changes are needed for the general case anyway), but I don't know whether this is still the case. - Eric On Tue, May 27, 2008 at 01:13:47PM -0700, Joe Little wrote: This past weekend, but holiday was ruined due to a log device replacement gone awry. I posted all about it here: http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html In a nutshell, an resilver of a single log device with itself, due to the fact one can't remove a log device from a pool once defined, cause ZFS to fully resilver but then attach the log device as as stripe to the volume, and no longer as a log device. The subsequent pool failure was exceptionally bad as the volume could no longer be imported and required read-only mounting of the remaining filesystems that I could to recover data. It would appear that log resilvers are broken, at least up to B85. I haven't seen code changes in this space so I presume this is likely an unaddressed problem. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS in S10U6 vs openSolaris 05/08
On May 27, 2008, at 1:44 PM, Rob Logan wrote: There is something more to consider with SSDs uses as a cache device. why use SATA as the interface? perhaps http://www.tgdaily.com/content/view/34065/135/ would be better? (no experience) We are pretty happy with RAMSAN SSD's (ours is RAM based, not flash). -Andy cards will start at 80 GB and will scale to 320 and 640 GB next year. By the end of 2008, Fusion io also hopes to roll out a 1.2 TB card. 160 parallel pipelines that can read data at 800 megabytes per second and write at 600 MB/sec 4K blocks and then streaming eight simultaneous 1 GB reads and writes. In that test, the ioDrive clocked in at 100,000 operations per second... beat $30 dollars a GB, ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS in S10U6 vs openSolaris 05/08
On May 23, 2008, at 22:21, Richard Elling wrote: Consider a case where you might use large, slow SATA drives (1 TByte, 7,200 rpm) for the main storage, and a single small, fast (36 GByte, 15krpm) drive for the L2ARC. This might provide a reasonable cost/performance trade-off. Ooh, neat; I hadn't considered that. Cool, thanks. :) -Bill - Bill McGonigle, Owner Work: 603.448.4440 BFC Computing, LLC Home: 603.448.1668 [EMAIL PROTECTED] Cell: 603.252.2606 http://www.bfccomputing.com/Page: 603.442.1833 Blog: http://blog.bfccomputing.com/ VCard: http://bfccomputing.com/vcard/bill.vcf ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog devices don't resilver correctly
On Tue, May 27, 2008 at 1:50 PM, Eric Schrock [EMAIL PROTECTED] wrote: Yeah, I noticed this the other day while I was working on an unrelated problem. The basic problem is that log devices are kept within the normal vdev tree, and are only distinguished by a bit indicating that they are log devices (and is the source for a number of other inconsistencies that Pwel has encountered). When doing a replacement, the userland code is responsible for creating the vdev configuration to use for the newly attached vdev. In this case, it doesn't preserve the 'is_log' bit correctly. This should be enforced in the kernel - it doesn't make sense to replace a log device with a non-log device, ever. I have a workspace with some other random ZFS changes, so I'll try to include this as well. FWIW, removing log devices is significantly easier than removing arbitrary devices, since there is no data to migrate (after the current txg is synced). At one point there were plans to do this as a separate piece of work (since the vdev changes are needed for the general case anyway), but I don't know whether this is still the case. Thanks for the reply. As noted, I do recommend against the log device as you can't remove it and the replacement as you see is touchy at best. I know the larger, but general vdev evacuation is ongoing, but if it is simple, log evacuation would make logs useful now instead of waiting. - Eric On Tue, May 27, 2008 at 01:13:47PM -0700, Joe Little wrote: This past weekend, but holiday was ruined due to a log device replacement gone awry. I posted all about it here: http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html In a nutshell, an resilver of a single log device with itself, due to the fact one can't remove a log device from a pool once defined, cause ZFS to fully resilver but then attach the log device as as stripe to the volume, and no longer as a log device. The subsequent pool failure was exceptionally bad as the volume could no longer be imported and required read-only mounting of the remaining filesystems that I could to recover data. It would appear that log resilvers are broken, at least up to B85. I haven't seen code changes in this space so I presume this is likely an unaddressed problem. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog devices don't resilver correctly
Joe - We definitely don't do great accounting of the 'vdev_islog' state here, and it's possible to create a situation where the parent replacing vdev has the state set but the children do not, but I have been unable to reproduce the behavior you saw. I have rebooted the system during resilver, manually detached the replacing vdev, and a variety of other things, but I've never seen the behavior you describe. In all cases, the log state is kept with the replacing vdev and restored when the resilver completes. I have also not observed the resilver failing with a bad log device. Can you provide more information about how to reproduce this problem? Perhaps without rebooting into B70 in the middle? Thanks, - Eric On Tue, May 27, 2008 at 01:50:04PM -0700, Eric Schrock wrote: Yeah, I noticed this the other day while I was working on an unrelated problem. The basic problem is that log devices are kept within the normal vdev tree, and are only distinguished by a bit indicating that they are log devices (and is the source for a number of other inconsistencies that Pwel has encountered). When doing a replacement, the userland code is responsible for creating the vdev configuration to use for the newly attached vdev. In this case, it doesn't preserve the 'is_log' bit correctly. This should be enforced in the kernel - it doesn't make sense to replace a log device with a non-log device, ever. I have a workspace with some other random ZFS changes, so I'll try to include this as well. FWIW, removing log devices is significantly easier than removing arbitrary devices, since there is no data to migrate (after the current txg is synced). At one point there were plans to do this as a separate piece of work (since the vdev changes are needed for the general case anyway), but I don't know whether this is still the case. - Eric On Tue, May 27, 2008 at 01:13:47PM -0700, Joe Little wrote: This past weekend, but holiday was ruined due to a log device replacement gone awry. I posted all about it here: http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html In a nutshell, an resilver of a single log device with itself, due to the fact one can't remove a log device from a pool once defined, cause ZFS to fully resilver but then attach the log device as as stripe to the volume, and no longer as a log device. The subsequent pool failure was exceptionally bad as the volume could no longer be imported and required read-only mounting of the remaining filesystems that I could to recover data. It would appear that log resilvers are broken, at least up to B85. I haven't seen code changes in this space so I presume this is likely an unaddressed problem. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog devices don't resilver correctly
On Tue, May 27, 2008 at 4:50 PM, Eric Schrock [EMAIL PROTECTED] wrote: Joe - We definitely don't do great accounting of the 'vdev_islog' state here, and it's possible to create a situation where the parent replacing vdev has the state set but the children do not, but I have been unable to reproduce the behavior you saw. I have rebooted the system during resilver, manually detached the replacing vdev, and a variety of other things, but I've never seen the behavior you describe. In all cases, the log state is kept with the replacing vdev and restored when the resilver completes. I have also not observed the resilver failing with a bad log device. Can you provide more information about how to reproduce this problem? Perhaps without rebooting into B70 in the middle? Well, this happened live on a production system, and I'm still in the process of rebuilding said system (trying to save all the snapshots) I don't know what triggered it. It was trying to resilver in B85, rebooted into B70 where it did resilver (but it was now using cmdk device naming vs the full scsi device names). It was marked degraded still even though re-silvering finished. Since the resilver took so long, I suspect the splicing in of the device took place in the B70. Again, it would never work in B85 -- just kept resetting. I'm wondering if the device path changing from cxtxdx to cxdx could be the trigger point. Thanks, - Eric On Tue, May 27, 2008 at 01:50:04PM -0700, Eric Schrock wrote: Yeah, I noticed this the other day while I was working on an unrelated problem. The basic problem is that log devices are kept within the normal vdev tree, and are only distinguished by a bit indicating that they are log devices (and is the source for a number of other inconsistencies that Pwel has encountered). When doing a replacement, the userland code is responsible for creating the vdev configuration to use for the newly attached vdev. In this case, it doesn't preserve the 'is_log' bit correctly. This should be enforced in the kernel - it doesn't make sense to replace a log device with a non-log device, ever. I have a workspace with some other random ZFS changes, so I'll try to include this as well. FWIW, removing log devices is significantly easier than removing arbitrary devices, since there is no data to migrate (after the current txg is synced). At one point there were plans to do this as a separate piece of work (since the vdev changes are needed for the general case anyway), but I don't know whether this is still the case. - Eric On Tue, May 27, 2008 at 01:13:47PM -0700, Joe Little wrote: This past weekend, but holiday was ruined due to a log device replacement gone awry. I posted all about it here: http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html In a nutshell, an resilver of a single log device with itself, due to the fact one can't remove a log device from a pool once defined, cause ZFS to fully resilver but then attach the log device as as stripe to the volume, and no longer as a log device. The subsequent pool failure was exceptionally bad as the volume could no longer be imported and required read-only mounting of the remaining filesystems that I could to recover data. It would appear that log resilvers are broken, at least up to B85. I haven't seen code changes in this space so I presume this is likely an unaddressed problem. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog devices don't resilver correctly
Joe Little wrote: On Tue, May 27, 2008 at 4:50 PM, Eric Schrock [EMAIL PROTECTED] wrote: Joe - We definitely don't do great accounting of the 'vdev_islog' state here, and it's possible to create a situation where the parent replacing vdev has the state set but the children do not, but I have been unable to reproduce the behavior you saw. I have rebooted the system during resilver, manually detached the replacing vdev, and a variety of other things, but I've never seen the behavior you describe. In all cases, the log state is kept with the replacing vdev and restored when the resilver completes. I have also not observed the resilver failing with a bad log device. Can you provide more information about how to reproduce this problem? Perhaps without rebooting into B70 in the middle? Well, this happened live on a production system, and I'm still in the process of rebuilding said system (trying to save all the snapshots) I don't know what triggered it. It was trying to resilver in B85, rebooted into B70 where it did resilver (but it was now using cmdk device naming vs the full scsi device names). It was marked degraded still even though re-silvering finished. Since the resilver took so long, I suspect the splicing in of the device took place in the B70. Again, it would never work in B85 -- just kept resetting. I'm wondering if the device path changing from cxtxdx to cxdx could be the trigger point. Joe, We're sorry about your problems. My take on how this is best handled, is that it be be better to expedite (raise priority) fixing the bug 6574286 removing a slog doesn't work rather than expend too much effort in understanding how it failed on your system. You would not have had this problem if you were able to remove a log device. Is that reasonable? Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Project Hardware
I'm using a gigabyte I-RAM card with cheap memory for my slog device with great results. Of course I don't have as much memory as you do in my project box. I also want to use the left over space on the I-ram and dual purpose it for a readzilla cache device and slog. Picked it up off ebay along with some computer guts and an nsc-314s 3u 14 hot swap drive rackmount case. Like everyone else, I have been spending hours trying to find a supported high capacity SATA card that supports PCI-Express. I wish someone would make a driver for that adaptec card mentioned in this thread. it is very reasonably priced for a project box. Everything else that fits that category and is supported seems to be $320 or much more. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss