Re: [zfs-discuss] improve meta data performance
Chris Banal cba...@gmail.com writes: We have a SunFire X4500 running Solaris 10U5 which does about 5-8k nfs ops of which about 90% are meta data. In hind sight it would have been significantly better to use a mirrored configuration but we opted for 4 x (9+2) raidz2 at the time. We can not take the downtime necessary to change the zpool configuration. We need to improve the meta data performance with little to no money. Does anyone have any suggestions? I believe the latest Solaris update will improve metadata caching. always good to be up-to-date on patches, no? Is there such a thing as a Sun supported NVRAM PCI-X card compatible with the X4500 which can be used as an L2ARC? I think they only have PCIe, and it hardly qualifies as little to no money. http://www.sun.com/storage/disk_systems/sss/f20/specs.xml I'll second the recommendations for Intel X25-M for L2ARC if you can spare a SATA slot for it. -- Kjetil T. Homme Redpill Linpro AS - Changing the game ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS mirrored boot disks
Interestingly, with the machine running, I can pull the first drive in the mirror, replace it with an unformatted one, format it, mirror rpool over to it, install the boot loader, and at that point the machine will boot with no problems. It s just when the first disk is missing that I have a problem with it. -- Terry -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS mirrored boot disks
On Fri, Feb 19, 2010 at 7:42 PM, Terry Hull t...@nrg-inc.com wrote: Interestingly, with the machine running, I can pull the first drive in the mirror, replace it with an unformatted one, format it, mirror rpool over to it, install the boot loader, and at that point the machine will boot with no problems. It s just when the first disk is missing that I have a problem with it. I had a problem cloning a disk for xVM domU where it hangs just after displaying hostname, similar to your result. I had to boot with livecd, force-import and export the pool, and reboot. It works. So you might want to try that. -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Growing ZFS Volume with SMI/VTOC label
Is it possible to grow a ZFS volume on a SPARC system with a SMI/VTOC label without losing data as the OS is built on this volume? Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disk controllers changing the names of disks
I am curious how admins are dealing with controllers like the Dell Perc 5 and 6 that can change the device name on a disk if a disk fails and the machine reboots. These controllers are not nicely behaved in that they happily fill in the device numbers for the physical drive that is missing. In that case, how can you recover the zpool that was on the disk? I understand if the pool was exported, you can then re-import it. However, what happens if the machine completely dies and you have no chance to export the pool? -- Terry -- You still can import it, Although you might loose some inflight data that was going in during crash and it can take a while during import to finish transactions, anyway, it will be fine. Yours Markus Kovero ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Growing ZFS Volume with SMI/VTOC label
So in a ZFS boot disk configuration (rpool) in a running environment, it's not possible? On Fri, Feb 19, 2010 at 9:25 AM, casper@sun.com wrote: Is it possible to grow a ZFS volume on a SPARC system with a SMI/VTOC label without losing data as the OS is built on this volume? Sure as long as the new partition starts on the same block and is longer. It was a bit more difficult with UFS but for zfs it is very simple. I had a few systems with two ufs root slices using live upgrade: slice 1slice 2swap First I booted from slice 2 ludelete slice1 zpool create rpool slice1 lucreate -p rpool luactivate slice1 init 6 from the zfs root: ludelete slice2 format: remove slice2; grow slice1 to incorporate slice2 label At that time I needed to reboot to get the new device size reflected in zpool list; today that is no longer needed Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Growing ZFS Volume with SMI/VTOC label
So in a ZFS boot disk configuration (rpool) in a running environment, it's not possible? The example I have does grows the rpool while running from the rpool. But you need a recent version of zfs to grow the pool while it is in use. On Fri, Feb 19, 2010 at 9:25 AM, casper@sun.com wrote: Is it possible to grow a ZFS volume on a SPARC system with a SMI/VTOC label without losing data as the OS is built on this volume? Sure as long as the new partition starts on the same block and is longer. It was a bit more difficult with UFS but for zfs it is very simple. I had a few systems with two ufs root slices using live upgrade: slice 1slice 2swap First I booted from slice 2 ludelete slice1 zpool create rpool slice1 lucreate -p rpool luactivate slice1 init 6 from the zfs root: ludelete slice2 format: remove slice2; grow slice1 to incorporate slice2 label At that time I needed to reboot to get the new device size reflected in zpool list; today that is no longer needed Casper --Boundary_(ID_oehH7aQu3QEaJqsmuxeYyA) Content-type: text/html; charset=ISO-8859-1 Content-transfer-encoding: QUOTED-PRINTABLE So in a ZFS boot disk configuration (rpool) in a running environment,= it#39;s not possible?brbrdiv class=3Dgmail_quoteOn Fri, Feb= 19, 2010 at 9:25 AM, span dir=3Dltrlt;a href=3Dmailto:Casper= @sun.comcasper@sun.com/agt;/span wrote:br blockquote class=3Dgmail_quote style=3Dmargin:0 0 0 .8ex;border-l= eft:1px #ccc solid;padding-left:1ex;div class=3Dimbr br gt;Is it possible to grow a ZFS volume on a SPARC system with a SMI/= VTOC labelbr gt;without losing data as the OS is built on this volume?br br br /divSure as long as the new partition starts on the same block and = is longer.br br It was a bit more difficult with UFS but for zfs it is very simple.b= r br I had a few systems with two ufs root slices using live upgrade:br br =A0 =A0 =A0 =A0lt;slice 1gt;lt;slice 2gt;lt;swapgt;br br First I booted from lt;slice 2gt;br ludelete quot;slice1quot;br zpool create rpool quot;slice1quot;br lucreate -p rpoolbr luactivate slice1br init 6br =66rom the zfs root:br ludelete slice2br format:br =A0 =A0 =A0 =A0 remove slice2;br =A0 =A0 =A0 =A0 grow slice1 to incorporate slice2br =A0 =A0 =A0 =A0 labelbr br At that time I needed to reboot to get the new device size reflected = inbr zpool list; today that is no longer neededbr br Casperbr br /blockquote/divbr --Boundary_(ID_oehH7aQu3QEaJqsmuxeYyA)-- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS mirrored boot disks
On Fri, February 19, 2010 00:32, Terry Hull wrote: I have a machine with the Supermicro 8 port SATA card installed. I have had no problem creating a mirrored boot disk using the oft-repeated scheme: prtvtoc /dev/rdsk/c4t0d0s2 | fmthard -s â /dev/rdsk/c4t1d0s2 zpool attach rpool c4t0d0s0 c4t1d0s0 wait for sync installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c4t1d0s0 Unfortunately when I shut the machine down and remove the primary boot disk, it will no longer boot. I get the boot loader, and if I turn off the splash screen I see it get to the point of displaying the host name. At that point, it hangs forever. From the posts I've seen it looks like this is a very standard scheme that just works. What can be missing with my procedure. I am running Build 132, if that matters. Disk boot order in your bios? I know that I succeeded in booting off the third (of four) disks in a mirror group Wednesday evening, but only after altering the disk boot order in the bios. Using that exact controller card, come to think of it. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS unit of compression
Hello. I want to know what is the unit of compression in ZFS. Is it 4 KB or larger? Is it tunnable? Thanks. Thanos -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS unit of compression
On 19/02/2010 15:43, Thanos Makatos wrote: Hello. I want to know what is the unit of compression in ZFS. Is it 4 KB or larger? Is it tunnable? I don't understand what you mean. For user data ZFS compresses ZFS blocks these would be 512 bytes minimum upto 128k maximum and depend on the configuration of the dataset (recordsize property) and the write pattern of the applications using it. If a block doesn't compress by more than 12.5% ZFS stores the uncompressed data instead - note this is not tunable and is hardcoded to the same value of all compression methods. The only tunnables for compression are selecting a different compression algorithm for the filesystem. What problem do you think you have or are you trying to solve ? If you read the source for the lzjb algorithm used in ZFS the lempel size is 1k, is that what you mean ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
One more thing I¹d like to add here: The PERC cache measurably and significantly accelerates small disk writes. However, for read operations, it is insignificant compared to system ram, both in terms of size and speed. There is no significant performance improvement by enabling adaptive readahead in the PERC. I will recommend instead, the PERC should be enabled for Write Back, and have the readahead disabled. Fortunately this is the default configuration on a new perc volume, so unless you changed it, you should be fine. It may be smart to double check, and ensure your OS does adaptive readahead. In Linux (rhel/centos) you can check that the ³readahead² service is loading. I noticed this is enabled by default in runlevel 5, but disabled by default in runlevel 3. Interesting. I don¹t know how to check solaris or opensolaris, to ensure adaptive readahead is enabled. On 2/18/10 8:08 AM, Edward Ned Harvey sola...@nedharvey.com wrote: Ok, I¹ve done all the tests I plan to complete. For highest performance, it seems: ·The measure I think is the most relevant for typical operation is the fastest random read /write / mix. (Thanks Bob, for suggesting I do this test.) The winner is clearly striped mirrors in ZFS ·The fastest sustained sequential write is striped mirrors via ZFS, or maybe raidz ·The fastest sustained sequential read is striped mirrors via ZFS, or maybe raidz Here are the results: ·Results summary of Bob's method http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary.pd f ·Raw results of Bob's method http://nedharvey.com/iozone_weezer/bobs%20method/raw_results.zip ·Results summary of Ned's method http://nedharvey.com/iozone_weezer/neds%20method/iozone%20results%20summary.pd f ·Raw results of Ned's method http://nedharvey.com/iozone_weezer/neds%20method/raw_results.zip From: Edward Ned Harvey [mailto:sola...@nedharvey.com] Sent: Saturday, February 13, 2010 9:07 AM To: opensolaris-disc...@opensolaris.org; zfs-discuss@opensolaris.org Subject: ZFS performance benchmarks in various configurations I have a new server, with 7 disks in it. I am performing benchmarks on it before putting it into production, to substantiate claims I make, like ³striping mirrors is faster than raidz² and so on. Would anybody like me to test any particular configuration? Unfortunately I don¹t have any SSD, so I can¹t do any meaningful test on the ZIL etc. Unless someone in the Boston area has a 2.5² SAS SSD they wouldn¹t mind lending for a few hours. ;-) My hardware configuration: Dell PE 2970 with 8 cores. Normally 32G, but I pulled it all out to get it down to 4G of ram. (Easier to benchmark disks when the file operations aren¹t all cached.) ;-) Solaris 10 10/09. PERC 6/i controller. All disks are configured in PERC for Adaptive ReadAhead, and Write Back, JBOD. 7 disks present, each SAS 15krpm 160G. OS is occupying 1 disk, so I have 6 disks to play with. I am currently running the following tests: Will test, including the time to flush(), various record sizes inside file sizes up to 16G, sequential write and sequential read. Not doing any mixed read/write requests. Not doing any random read/write. iozone -Reab somefile.wks -g 17G -i 1 -i 0 Configurations being tested: ·Single disk ·2-way mirror ·3-way mirror ·4-way mirror ·5-way mirror ·6-way mirror ·Two mirrors striped (or concatenated) ·Three mirrors striped (or concatenated) ·5-disk raidz ·6-disk raidz ·6-disk raidz2 Hypothesized results: ·N-way mirrors write at the same speed of a single disk ·N-way mirrors read n-times faster than a single disk ·Two mirrors striped read and write 2x faster than a single mirror ·Three mirrors striped read and write 3x faster than a single mirror ·Raidz and raidz2: No hypothesis. Some people say they perform comparable to many disks working together. Some people say it¹s slower than a single disk. Waiting to see the results. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
hello i have made some benchmarks with my napp-it zfs-serverbr a href=http://www.napp-it.org/bench.pdf; target=_blankscreenshot/abr br a href=http://www.napp-it.org/bench.pdf; target=_blankwww.napp-it.org/bench.pdf/abr br - 2gb vs 4 gb vs 8 gb rambr - mirror vs raidz vs raidz2 vs raidz3br - dedup and compress enabled vs disabledbr br result in short:br 8gb ram vs 2 Gb: + 10% .. +500% more power (green drives)br compress and dedup enabled: + 50% .. +300%br mirror vs Raidz: fastest is raidz, slowest mirror, raidz level +/-20%br br br gea -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Poor ZIL SLC SSD performance
Hi, I'm currently testing a Mtron Pro 7500 16GB SLC SSD as a ZIL device and seeing very poor performance for small file writes via NFS. Copying a source code directory with around 4000 small files to the ZFS pool over NFS without the SSD log device yields around 1000 IOPS (pool of 8 sata shared mirrors). When adding the SSD as ZIL, performance drops to 50 IOPS! I can see similarly poor performance when creating a ZFS pool on the SSD and sharing it via NFS. However copy the files locally on the server from the sata to the ssd pool only takes a few seconds. The SSD's specs reveal: sequential r/w 512B: 83,000/51,000 sequential r/w 4KB: 21,000/13,000 random r/w 512B: 19,000/130 random r/w 4KB: 12,000/130 So it is apparent, that the SSD has really poor random writes. But I was under the impression, that the ZIL is mostly sequential writes or was I misinformed here? Maybe the cache syncs bring the device to it's knees? Best Regards, Felix Buenemann ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
On Fri, 19 Feb 2010, Felix Buenemann wrote: So it is apparent, that the SSD has really poor random writes. But I was under the impression, that the ZIL is mostly sequential writes or was I misinformed here? Maybe the cache syncs bring the device to it's knees? That's what it seems like. This particular device must actually being obeying the cache sync request rather than just pretending to like many SSDs. Most SSDs are very good at seeking, and very good at random reads, but most are rather poor at small synchronous writes. The ones which are good at small synchronous writes cost more. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
Am 19.02.10 19:30, schrieb Bob Friesenhahn: On Fri, 19 Feb 2010, Felix Buenemann wrote: So it is apparent, that the SSD has really poor random writes. But I was under the impression, that the ZIL is mostly sequential writes or was I misinformed here? Maybe the cache syncs bring the device to it's knees? That's what it seems like. This particular device must actually being obeying the cache sync request rather than just pretending to like many SSDs. Most SSDs are very good at seeking, and very good at random reads, but most are rather poor at small synchronous writes. The ones which are good at small synchronous writes cost more. Too bad, I'm getting ~1000 IOPS with an Intel X25-M G2 MLC and around 300 with a regular USB stick, so 50 IOPS is really poor for an SLC SSD. Bob - Felix ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
On Fri, February 19, 2010 12:50, Felix Buenemann wrote: Too bad, I'm getting ~1000 IOPS with an Intel X25-M G2 MLC and around 300 with a regular USB stick, so 50 IOPS is really poor for an SLC SSD. Well, but the Intel X25-M is the drive that really first cracked the problem (earlier high-performance drives were hideously expensive and rather brute force). Which was relatively recently. The industry is still evolving rapidly. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Idiots Guide to Running a NAS with ZFS/OpenSolaris
I can strongly recommend this series of articles http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ Very good! :o) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
On Fri, 19 Feb 2010, David Dyer-Bennet wrote: Too bad, I'm getting ~1000 IOPS with an Intel X25-M G2 MLC and around 300 with a regular USB stick, so 50 IOPS is really poor for an SLC SSD. Well, but the Intel X25-M is the drive that really first cracked the problem (earlier high-performance drives were hideously expensive and rather brute force). Which was relatively recently. The industry is still evolving rapidly. What is the problem is it that the X25-M cracked? The X25-M is demonstrated to ignore cache sync and toss transactions. As such, it is useless for a ZIL. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
Am 19.02.10 20:50, schrieb Bob Friesenhahn: On Fri, 19 Feb 2010, David Dyer-Bennet wrote: Too bad, I'm getting ~1000 IOPS with an Intel X25-M G2 MLC and around 300 with a regular USB stick, so 50 IOPS is really poor for an SLC SSD. Well, but the Intel X25-M is the drive that really first cracked the problem (earlier high-performance drives were hideously expensive and rather brute force). Which was relatively recently. The industry is still evolving rapidly. What is the problem is it that the X25-M cracked? The X25-M is demonstrated to ignore cache sync and toss transactions. As such, it is useless for a ZIL. Yes, I see no difference with the X25-M with both zfs_nocacheflush=0 and zfs_nocacheflush=1. After setting zfs_nocacheflush=1, the Mtron SSD also performed at around 1000 IOPS, which is still useless, because the array performs the same IOPS without dedicated ZIL. Looking at the X25-E (SLC) benchmarks it should be able to do about 3000 IOPS, which would improve array performance. I think I'll try one of thise inexpensive battery-backed PCI RAM drives from Gigabyte and see how much IOPS they can pull. Bob - Felix ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
On Fri, February 19, 2010 13:50, Bob Friesenhahn wrote: On Fri, 19 Feb 2010, David Dyer-Bennet wrote: Too bad, I'm getting ~1000 IOPS with an Intel X25-M G2 MLC and around 300 with a regular USB stick, so 50 IOPS is really poor for an SLC SSD. Well, but the Intel X25-M is the drive that really first cracked the problem (earlier high-performance drives were hideously expensive and rather brute force). Which was relatively recently. The industry is still evolving rapidly. What is the problem is it that the X25-M cracked? The X25-M is demonstrated to ignore cache sync and toss transactions. As such, it is useless for a ZIL. But it's finally useful as, for example, a notebook boot drive. No previous vaguely affordable design was. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
felix.buenem...@googlemail.com said: I think I'll try one of thise inexpensive battery-backed PCI RAM drives from Gigabyte and see how much IOPS they can pull. Another poster, Tracy Bernath, got decent ZIL IOPS from an OCZ Vertex unit. Dunno if that's sufficient for your purposes, but it looked pretty good for the money. Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSDs with a SCSI SCA interface?
On 12/ 4/09 02:06 AM, Erik Trimble wrote: Hey folks. I've looked around quite a bit, and I can't find something like this: I have a bunch of older systems which use Ultra320 SCA hot-swap connectors for their internal drives. (e.g. v20z and similar) I'd love to be able to use modern flash SSDs with these systems, but I have yet to find someone who makes anything that would fit the bill. I need either: (a) a SSD with an Ultra160/320 parallel interface (I can always find an interface adapter, so I'm not particular about whether it's a 68-pin or SCA) Bitmicro makes one: http://www.bitmicro.com/products_edisk_altima_35_u320.php They also make a version with a 4Gb FC interface. Haven't tried either one, but found Bitmicro when researching SSD options for a V890. Eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] rule of thumb for scrub
I think I asked this before but apparently have lost track of the answers I got. I'm wanting a general rule of thumb for how often to `scrub'. My setup is a home NAS and general zfs server so it does not see heavy use. I'm up to build 129 and do update fairly often, just the last few builds were a bit too problematic. My disks are setup in 3 mirrored pairs. They do get regular use when my other machines access the zfs server for backups, and the nfs served directories shared all around. But still only home usage no business involved but maybe a bit of a heavy hobbist user. With that in mind what would be a good safe plan for `scrubbing'? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Feb 19, 2010, at 8:35 AM, Edward Ned Harvey wrote: One more thing I’d like to add here: The PERC cache measurably and significantly accelerates small disk writes. However, for read operations, it is insignificant compared to system ram, both in terms of size and speed. There is no significant performance improvement by enabling adaptive readahead in the PERC. I will recommend instead, the PERC should be enabled for Write Back, and have the readahead disabled. Fortunately this is the default configuration on a new perc volume, so unless you changed it, you should be fine. It may be smart to double check, and ensure your OS does adaptive readahead. In Linux (rhel/centos) you can check that the “readahead” service is loading. I noticed this is enabled by default in runlevel 5, but disabled by default in runlevel 3. Interesting. I don’t know how to check solaris or opensolaris, to ensure adaptive readahead is enabled. ZFS has intelligent prefetching. AFAIK, Solaris disk drivers do not prefetch. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance
On 18 feb 2010, at 13.55, Phil Harman wrote: ... Whilst the latest bug fixes put the world to rights again with respect to correctness, it may be that some of our performance workaround are still unsafe (i.e. if my iSCSI client assumes all writes are synchronised to nonvolatile storage, I'd better be pretty sure of the failure modes before I work around that). But are there any clients that assume that an iSCSI volume is synchronous? Isn't an iSCSI target supposed to behave like any other SCSI disk (pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)? With that I mean: A disk which understands SCSI commands with an optional write cache that could be turned off, with cache sync command, and all those things. Put in another way, isn't is the OS/file systems responsibility to use the SCSI disk responsibly regardless of the underlying protocol? /ragge ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On 19 feb 2010, at 17.35, Edward Ned Harvey wrote: The PERC cache measurably and significantly accelerates small disk writes. However, for read operations, it is insignificant compared to system ram, both in terms of size and speed. There is no significant performance improvement by enabling adaptive readahead in the PERC. I will recommend instead, the PERC should be enabled for Write Back, and have the readahead disabled. Fortunately this is the default configuration on a new perc volume, so unless you changed it, you should be fine. If I understand correctly, ZFS now adays will only flush data to non volatile storage (such as a RAID controller NVRAM), and not all the way out to disks. (To solve performance problems with some storage systems, and I believe that it also is the right thing to do under normal circumstances.) Doesn't this mean that if you enable write back, and you have a single, non-mirrored raid-controller, and your raid controller dies on you so that you loose the contents of the nvram, you have a potentially corrupt file system? /ragge ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
Am 19.02.10 21:29, schrieb Marion Hakanson: felix.buenem...@googlemail.com said: I think I'll try one of thise inexpensive battery-backed PCI RAM drives from Gigabyte and see how much IOPS they can pull. Another poster, Tracy Bernath, got decent ZIL IOPS from an OCZ Vertex unit. Dunno if that's sufficient for your purposes, but it looked pretty good for the money. I found the Hyperdrive 5/5M, which is a half-height drive bay sata ramdisk with battery backup and auto-backup to compact flash at power failure. Promises 65,000 IOPS and thus should be great for ZIL. It's pretty reasonable priced (~230 EUR) and stacked with 4GB or 8GB DDR2-ECC should be more than sufficient. http://www.hyperossystems.co.uk/07042003/hardware.htm Marion - Felix ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance
On Feb 19, 2010, at 4:57 PM, Ragnar Sundblad ra...@csc.kth.se wrote: On 18 feb 2010, at 13.55, Phil Harman wrote: ... Whilst the latest bug fixes put the world to rights again with respect to correctness, it may be that some of our performance workaround are still unsafe (i.e. if my iSCSI client assumes all writes are synchronised to nonvolatile storage, I'd better be pretty sure of the failure modes before I work around that). But are there any clients that assume that an iSCSI volume is synchronous? Isn't an iSCSI target supposed to behave like any other SCSI disk (pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)? With that I mean: A disk which understands SCSI commands with an optional write cache that could be turned off, with cache sync command, and all those things. Put in another way, isn't is the OS/file systems responsibility to use the SCSI disk responsibly regardless of the underlying protocol? That was my argument a while back. If you use /dev/dsk then all writes should be asynchronous and WCE should be on and the initiator should issue a 'sync' to make sure it's in NV storage, if you use /dev/rdsk all writes should be synchronous and WCE should be off. RCD should be off in all cases and the ARC should cache all it can. Making COMSTAR always start with /dev/rdsk and flip to /dev/dsk if the initiator flags write cache is the wrong way to go about it. It's more complicated then it needs to be and it leaves setting the storage policy up to the system admin rather then the storage admin. It would be better to put effort into supporting FUA and DPO options in the target then dynamically changing a volume's cache policy from the initiator side. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] rule of thumb for scrub
Hi Harry, Our current scrubbing guideline is described here: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide Run zpool scrub on a regular basis to identify data integrity problems. If you have consumer-quality drives, consider a weekly scrubbing schedule. If you have datacenter-quality drives, consider a monthly scrubbing schedule. Thanks, Cindy On 02/19/10 14:28, Harry Putnam wrote: I think I asked this before but apparently have lost track of the answers I got. I'm wanting a general rule of thumb for how often to `scrub'. My setup is a home NAS and general zfs server so it does not see heavy use. I'm up to build 129 and do update fairly often, just the last few builds were a bit too problematic. My disks are setup in 3 mirrored pairs. They do get regular use when my other machines access the zfs server for backups, and the nfs served directories shared all around. But still only home usage no business involved but maybe a bit of a heavy hobbist user. With that in mind what would be a good safe plan for `scrubbing'? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lost disk geometry
On Fri, Feb 19, 2010 at 01:15:17PM -0600, David Dyer-Bennet wrote: On Fri, February 19, 2010 13:09, David Dyer-Bennet wrote: Anybody know what the proper geometry is for a WD1600BEKT-6-1A13? It's not even in the data sheets any more! any such geometry has been entirely fictitious since ZBR disks emerged in, oh, about 1990. One further point -- I can't seem to enter the geometry the second disk has manually for the first; when I enter 152615 for number of sectors, it says this is out of range. It's probably reading some garbage as a label. dd 0's over the start of it and try again, perhaps with a hotplug or reboot in between if necessary. -- Dan. pgpTu9Ms6QbRj.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance
On 19/02/2010 21:57, Ragnar Sundblad wrote: On 18 feb 2010, at 13.55, Phil Harman wrote: Whilst the latest bug fixes put the world to rights again with respect to correctness, it may be that some of our performance workaround are still unsafe (i.e. if my iSCSI client assumes all writes are synchronised to nonvolatile storage, I'd better be pretty sure of the failure modes before I work around that). But are there any clients that assume that an iSCSI volume is synchronous? Isn't an iSCSI target supposed to behave like any other SCSI disk (pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)? With that I mean: A disk which understands SCSI commands with an optional write cache that could be turned off, with cache sync command, and all those things. Put in another way, isn't is the OS/file systems responsibility to use the SCSI disk responsibly regardless of the underlying protocol? /ragge Yes, that would be nice wouldn't it? But the world is seldom that simple, is it? For example, Sun's first implementation of zvol was unsafe by default, with no cache flush option either. A few years back we used to note that one of the reasons Solaris was slower than Linux at fileystems microbenchmarks was because Linux ran with the write caches on (whereas we would never be that foolhardy). And then this seems to claim that NTFS may not be that smart either ... http://blogs.sun.com/roch/entry/iscsi_unleashed (see the WCE Settings paragraph) I'm only going on what I've read. Cheers, Phil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disk controllers changing the names of disks
On FreeBSD, I avoid this issue completely by labelling either the entire disk (via glabel(8)) or individual slices/partitions (via either glabel(8) or gpt labels). Use the label name to build the vdevs. Then it doesn't matter where the drive is connected, or how the device node is named/numbered, everything Just Works(tm). :) Hopefully, there are similar tools for labelling disks/partitions on Solaris systems. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
If I understand correctly, ZFS now adays will only flush data to non volatile storage (such as a RAID controller NVRAM), and not all the way out to disks. (To solve performance problems with some storage systems, and I believe that it also is the right thing to do under normal circumstances.) Doesn't this mean that if you enable write back, and you have a single, non-mirrored raid-controller, and your raid controller dies on you so that you loose the contents of the nvram, you have a potentially corrupt file system? ZFS requires,that all writes be flushed to non-volatile storage. This is needed for both transaction group (txg) commits to ensure pool integrity and for the ZIL to satisfy the synchronous requirement of fsync/O_DSYNC etc. If the caches weren't flushed then it would indeed be quicker but the pool would be susceptible to corruption. Sadly some hardware doesn't honour cache flushes and this can cause corruption. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
On Fri, Feb 19, 2010 at 11:17:29PM +0100, Felix Buenemann wrote: I found the Hyperdrive 5/5M, which is a half-height drive bay sata ramdisk with battery backup and auto-backup to compact flash at power failure. Promises 65,000 IOPS and thus should be great for ZIL. It's pretty reasonable priced (~230 EUR) and stacked with 4GB or 8GB DDR2-ECC should be more than sufficient. Wouldn't it be better investing these 300-350 EUR into 16 GByte or more of system memory, and a cheap UPS? http://www.hyperossystems.co.uk/07042003/hardware.htm -- Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lost disk geometry
On Fri, February 19, 2010 16:21, Daniel Carosone wrote: On Fri, Feb 19, 2010 at 01:15:17PM -0600, David Dyer-Bennet wrote: On Fri, February 19, 2010 13:09, David Dyer-Bennet wrote: Anybody know what the proper geometry is for a WD1600BEKT-6-1A13? It's not even in the data sheets any more! any such geometry has been entirely fictitious since ZBR disks emerged in, oh, about 1990. Sure, but there still have to be values put into format to satisfy it! Had to look up ZBR, but indeed I guessed correctly that it was the transition to variable numbers of sector per track (to give much more uniform linear size to each sector) that you were referring to. Yep, totally and utterly fictitious. One further point -- I can't seem to enter the geometry the second disk has manually for the first; when I enter 152615 for number of sectors, it says this is out of range. It's probably reading some garbage as a label. dd 0's over the start of it and try again, perhaps with a hotplug or reboot in between if necessary. The details of interaction between what's already written there, and what can be written there by the tools, are driving me quite insane (as Cindy said the other day!). I found some of my earlier tests weren't valid since I apparently omitted writing out the labels in a couple of key cases. Now I've got two slightly different geometries going again, but they're working in the mirror (the old disks in the mirror are much smaller, so anything that works and gives access to over 50% of the new disk will attach to the mirror; but I want to get it right before detaching the old disks) . -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
On 19 feb 2010, at 23.40, Eugen Leitl wrote: On Fri, Feb 19, 2010 at 11:17:29PM +0100, Felix Buenemann wrote: I found the Hyperdrive 5/5M, which is a half-height drive bay sata ramdisk with battery backup and auto-backup to compact flash at power failure. Promises 65,000 IOPS and thus should be great for ZIL. It's pretty reasonable priced (~230 EUR) and stacked with 4GB or 8GB DDR2-ECC should be more than sufficient. Wouldn't it be better investing these 300-350 EUR into 16 GByte or more of system memory, and a cheap UPS? System memory can't replace a slog, since a slog is supposed to be non-volatile. An UPS plus disabling zil, or disabling synchronization, could possibly achieve the same result (or maybe better) iops wise. This would probably work given that your computer never crashes in an uncontrolled manner. If it does, some data may be lost (and possibly the entire pool lost, if you are unlucky). /ragge ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
On Fri, Feb 19, 2010 at 11:51:29PM +0100, Ragnar Sundblad wrote: On 19 feb 2010, at 23.40, Eugen Leitl wrote: On Fri, Feb 19, 2010 at 11:17:29PM +0100, Felix Buenemann wrote: I found the Hyperdrive 5/5M, which is a half-height drive bay sata ramdisk with battery backup and auto-backup to compact flash at power failure. Promises 65,000 IOPS and thus should be great for ZIL. It's pretty reasonable priced (~230 EUR) and stacked with 4GB or 8GB DDR2-ECC should be more than sufficient. These are the same as the acard devices we've discussed here previously; earlier hyperdrive models were their own design. Very interesting, and my personal favourite, but I don't know of anyone actually reporting results yet with them as ZIL. If you have more memory in them than is needed for ZIL, with some partitioning you could make a small fast pool on them for swap space and other purposes. I was originally looking at these for Postgres WAL logfiles, before there was slog and on a different platform.. Also, if you have enough non-ECC memory there's a mode where it adds its own redundancy for reduced space, which could allow reusing existing kit - replace non-ecc system memory with ecc. Wouldn't it be better investing these 300-350 EUR into 16 GByte or more of system memory, and a cheap UPS? System memory can't replace a slog, since a slog is supposed to be non-volatile. System memory might already be maxed out, too. -- Dan. pgpjosbsNcB9Y.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
On 19-Feb-10, at 5:40 PM, Eugen Leitl wrote: On Fri, Feb 19, 2010 at 11:17:29PM +0100, Felix Buenemann wrote: I found the Hyperdrive 5/5M, which is a half-height drive bay sata ramdisk with battery backup and auto-backup to compact flash at power failure. Promises 65,000 IOPS and thus should be great for ZIL. It's pretty reasonable priced (~230 EUR) and stacked with 4GB or 8GB DDR2-ECC should be more than sufficient. Wouldn't it be better investing these 300-350 EUR into 16 GByte or more of system memory, and a cheap UPS? That would depend on the read/write mix, I think? --Toby http://www.hyperossystems.co.uk/07042003/hardware.htm -- Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
An UPS plus disabling zil, or disabling synchronization, could possibly achieve the same result (or maybe better) iops wise. Even with the fastest slog, disabling zil will always be faster... (less bytes to move) This would probably work given that your computer never crashes in an uncontrolled manner. If it does, some data may be lost (and possibly the entire pool lost, if you are unlucky). the pool would never be at risk, but when your server reboots, its clients will be confused that things it sent, and the server promised it had saved, are gone. For some clients, this small loss might be the loss of their entire dataset. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance
On 19 feb 2010, at 23.20, Ross Walker wrote: On Feb 19, 2010, at 4:57 PM, Ragnar Sundblad ra...@csc.kth.se wrote: On 18 feb 2010, at 13.55, Phil Harman wrote: ... Whilst the latest bug fixes put the world to rights again with respect to correctness, it may be that some of our performance workaround are still unsafe (i.e. if my iSCSI client assumes all writes are synchronised to nonvolatile storage, I'd better be pretty sure of the failure modes before I work around that). But are there any clients that assume that an iSCSI volume is synchronous? Isn't an iSCSI target supposed to behave like any other SCSI disk (pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)? With that I mean: A disk which understands SCSI commands with an optional write cache that could be turned off, with cache sync command, and all those things. Put in another way, isn't is the OS/file systems responsibility to use the SCSI disk responsibly regardless of the underlying protocol? That was my argument a while back. If you use /dev/dsk then all writes should be asynchronous and WCE should be on and the initiator should issue a 'sync' to make sure it's in NV storage, if you use /dev/rdsk all writes should be synchronous and WCE should be off. RCD should be off in all cases and the ARC should cache all it can. Making COMSTAR always start with /dev/rdsk and flip to /dev/dsk if the initiator flags write cache is the wrong way to go about it. It's more complicated then it needs to be and it leaves setting the storage policy up to the system admin rather then the storage admin. It would be better to put effort into supporting FUA and DPO options in the target then dynamically changing a volume's cache policy from the initiator side. But wouldn't the most disk like behavior then be to implement all the FUA, DPO, cache mode page, flush cache, etc, etc, have COMSTAR implement a cache just like disks do, maybe have a user knob to set the cache size (typically 32 MB or so on modern disks, could probably be used here too as a default), and still use /dev/rdsk devices? That could seem, in my naive limited little mind and humble opinion, as a pretty good approximation of how real disks work, and no OS should have to be more surprised than usual of how a SCSI disk works. Maybe COMSTAR already does this, or parts of it? Or am I wrong? /ragge ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] l2arc current usage (population size)
Hello, How do you tell how much of your l2arc is populated? I've been looking for a while now, can't seem to find it. Must be easy, as this blog entry shows it over time: http://blogs.sun.com/brendan/entry/l2arc_screenshots And follow up, can you tell how much of each data set is in the arc or l2arc? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
These are the same as the acard devices we've discussed here previously; earlier hyperdrive models were their own design. Very interesting, and my personal favourite, but I don't know of anyone actually reporting results yet with them as ZIL. Here's one report: http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg27739.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance
On 19 feb 2010, at 23.22, Phil Harman wrote: On 19/02/2010 21:57, Ragnar Sundblad wrote: On 18 feb 2010, at 13.55, Phil Harman wrote: Whilst the latest bug fixes put the world to rights again with respect to correctness, it may be that some of our performance workaround are still unsafe (i.e. if my iSCSI client assumes all writes are synchronised to nonvolatile storage, I'd better be pretty sure of the failure modes before I work around that). But are there any clients that assume that an iSCSI volume is synchronous? Isn't an iSCSI target supposed to behave like any other SCSI disk (pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)? With that I mean: A disk which understands SCSI commands with an optional write cache that could be turned off, with cache sync command, and all those things. Put in another way, isn't is the OS/file systems responsibility to use the SCSI disk responsibly regardless of the underlying protocol? /ragge Yes, that would be nice wouldn't it? But the world is seldom that simple, is it? For example, Sun's first implementation of zvol was unsafe by default, with no cache flush option either. A few years back we used to note that one of the reasons Solaris was slower than Linux at fileystems microbenchmarks was because Linux ran with the write caches on (whereas we would never be that foolhardy). (Exactly, and there are more better fast than safe evilness in that OS too, especially in the file system area. That is why I never use it for anything that should store anything.) And then this seems to claim that NTFS may not be that smart either ... http://blogs.sun.com/roch/entry/iscsi_unleashed (see the WCE Settings paragraph) I'm only going on what I've read. But - all normal disks come with write caching enabled, so in both the Linux case and the NTFS case, this is how they always operate, with all disks, so why should an iSCSI lun behave any different? If they can't handle the write cache (handle syncing, barriers, ordering an all that), they should turn the cache off, just as Solaris does in almost all cases except when you use an entire disk for zfs (I believe because solaris UFS was never really adapted to write caches). And they should do that for all SCSI disks. (I seem to recall at in the bad old days you had to disable the write cache yourself if you should use a disk on SunOS, but that was probably because it wasn't standardized, and you did it with a jumper on the controller board.) So - I just do not understand why an iSCSI lun should not try to emulate how all other SCSI disks work as much as possible? This must be the most compatible mode of operation, or am I wrong? /ragge ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] l2arc current usage (population size)
On 19 February, 2010 - Christo Kutrovsky sent me these 0,5K bytes: Hello, How do you tell how much of your l2arc is populated? I've been looking for a while now, can't seem to find it. Must be easy, as this blog entry shows it over time: http://blogs.sun.com/brendan/entry/l2arc_screenshots And follow up, can you tell how much of each data set is in the arc or l2arc? kstat -m zfs (p, c, l2arc_size) arc_stat.pl is good, but doesn't show l2arc.. /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
On 20 feb 2010, at 02.34, Rob Logan wrote: An UPS plus disabling zil, or disabling synchronization, could possibly achieve the same result (or maybe better) iops wise. Even with the fastest slog, disabling zil will always be faster... (less bytes to move) This would probably work given that your computer never crashes in an uncontrolled manner. If it does, some data may be lost (and possibly the entire pool lost, if you are unlucky). the pool would never be at risk, but when your server reboots, its clients will be confused that things it sent, and the server promised it had saved, are gone. For some clients, this small loss might be the loss of their entire dataset. No, the entire pool shouldn't be at risk, you are right of course, I don't know what I was thinking. Sorry! /ragge ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
Am 20.02.10 01:33, schrieb Toby Thain: On 19-Feb-10, at 5:40 PM, Eugen Leitl wrote: On Fri, Feb 19, 2010 at 11:17:29PM +0100, Felix Buenemann wrote: I found the Hyperdrive 5/5M, which is a half-height drive bay sata ramdisk with battery backup and auto-backup to compact flash at power failure. Promises 65,000 IOPS and thus should be great for ZIL. It's pretty reasonable priced (~230 EUR) and stacked with 4GB or 8GB DDR2-ECC should be more than sufficient. Wouldn't it be better investing these 300-350 EUR into 16 GByte or more of system memory, and a cheap UPS? That would depend on the read/write mix, I think? Well the workload will include MaxDB (SAP), Exchange and file services (SMB), with the opensolaris box acting as a VMFS iSCSI target for VMware vSphere. Due to the mixed workload it's hard to predict how exactly the I/O distribution will look like, so I'm trying to build a system that can hold up in various usage scenarios. I've been testing with NFS because it loads the ZIL heavily. Btw. in my testing I didn't really see a performance improvement with ZIL disabled over on disk ZIL, but I've only been testing with a single NFS client. Or do I need multiple concurrent clients to benefit from external ZIL? Also is there a guideline on sizing the ZIL? I think in most cases even 1GB would be enough, but I haven't done any heavy testing. --Toby - Felix ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss