Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Jeroen Roodhart
Hi list, If you're running solaris proper, you better mirror your ZIL log device. ... I plan to get to test this as well, won't be until late next week though. Running OSOL nv130. Power off the machine, removed the F20 and power back on. Machines boots OK and comes up normally with the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Jeroen Roodhart
Hi list, If you're running solaris proper, you better mirror your ZIL log device. ... I plan to get to test this as well, won't be until late next week though. Running OSOL nv130. Power off the machine, removed the F20 and power back on. Machines boots OK and comes up normally with the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Edward Ned Harvey
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Jeroen Roodhart If you're running solaris proper, you better mirror your ZIL log device. ... I plan to get to test this as well, won't be until late next week though. Running

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Ragnar Sundblad
On 7 apr 2010, at 14.28, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Jeroen Roodhart If you're running solaris proper, you better mirror your ZIL log device. ... I plan to get to test this as well, won't

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Robert Milkowski
On 07/04/2010 13:58, Ragnar Sundblad wrote: Rather: ...=19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. For a file server, mail server, etc etc, where things are stored and supposed to be available later, you

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn
On Wed, 7 Apr 2010, Ragnar Sundblad wrote: So the recommendation for zpool 19 would be *strongly* recommended. Mirror your log device if you care about using your pool. And the recommendation for zpool =19 would be ... don't mirror your log device. If you have more than one, just add them

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Robert Milkowski
On 07/04/2010 15:35, Bob Friesenhahn wrote: On Wed, 7 Apr 2010, Ragnar Sundblad wrote: So the recommendation for zpool 19 would be *strongly* recommended. Mirror your log device if you care about using your pool. And the recommendation for zpool =19 would be ... don't mirror your log

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn
On Wed, 7 Apr 2010, Robert Milkowski wrote: it is only read at boot if there are uncomitted data on it - during normal reboots zfs won't read data from slog. How does zfs know if there is uncomitted data on the slog device without reading it? The minimal read would be quite small, but it

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Neil Perrin
On 04/07/10 09:19, Bob Friesenhahn wrote: On Wed, 7 Apr 2010, Robert Milkowski wrote: it is only read at boot if there are uncomitted data on it - during normal reboots zfs won't read data from slog. How does zfs know if there is uncomitted data on the slog device without reading it? The

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Edward Ned Harvey
From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. If you have a system crash, *and* a failed log device at the same time, this is an important

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Edward Ned Harvey
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bob Friesenhahn It is also worth pointing out that in normal operation the slog is essentially a write-only device which is only read at boot time. The writes are assumed to work if the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Neil Perrin
On 04/07/10 10:18, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bob Friesenhahn It is also worth pointing out that in normal operation the slog is essentially a write-only device which is only read at boot time.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Mark J Musante
On Wed, 7 Apr 2010, Neil Perrin wrote: There have previously been suggestions to read slogs periodically. I don't know if there's a CR raised for this though. Roch wrote up CR 6938883 Need to exercise read from slog dynamically Regards, markm ___

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn
On Wed, 7 Apr 2010, Edward Ned Harvey wrote: From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. If you have a system crash, *and* a failed log device

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn
On Wed, 7 Apr 2010, Edward Ned Harvey wrote: BTW, does the system *ever* read from the log device during normal operation? Such as perhaps during a scrub? It really would be nice to detect failure of log devices in advance, that are claiming to write correctly, but which are really

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Richard Elling
On Apr 7, 2010, at 10:19 AM, Bob Friesenhahn wrote: On Wed, 7 Apr 2010, Edward Ned Harvey wrote: From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Miles Nordin
jr == Jeroen Roodhart j.r.roodh...@uva.nl writes: jr Running OSOL nv130. Power off the machine, removed the F20 and jr power back on. Machines boots OK and comes up normally with jr the following message in 'zpool status': yeah, but try it again and this time put rpool on the F20 as

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Ragnar Sundblad
On 7 apr 2010, at 18.13, Edward Ned Harvey wrote: From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. If you have a system crash, *and* a failed

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-06 Thread Jeroen Roodhart
Hi Roch, Can you try 4 concurrent tar to four different ZFS filesystems (same pool). Hmmm, you're on to something here: http://www.science.uva.nl/~jeroen/zil_compared_e1000_iostat_iops_svc_t_10sec_interval.pdf In short: when using two exported file systems total time goes down to around

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-06 Thread Edward Ned Harvey
We ran into something similar with these drives in an X4170 that turned out to be an issue of the preconfigured logical volumes on the drives. Once we made sure all of our Sun PCI HBAs where running the exact same version of firmware and recreated the volumes on new drives arriving

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-05 Thread Kyle McDonald
On 4/4/2010 11:04 PM, Edward Ned Harvey wrote: Actually, It's my experience that Sun (and other vendors) do exactly that for you when you buy their parts - at least for rotating drives, I have no experience with SSD's. The Sun disk label shipped on all the drives is setup to make the drive

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-05 Thread Edward Ned Harvey
From: Kyle McDonald [mailto:kmcdon...@egenera.com] So does your HBA have newer firmware now than it did when the first disk was connected? Maybe it's the HBA that is handling the new disks differently now, than it did when the first one was plugged in? Can you down rev the HBA FW? Do you

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-04 Thread Ragnar Sundblad
On 4 apr 2010, at 06.01, Richard Elling wrote: Thank you for your reply! Just wanted to make sure. Do not assume that power outages are the only cause of unclean shutdowns. -- richard Thanks, I have seen that mistake several times with other (file)systems, and hope I'll never ever make it

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-04 Thread Edward Ned Harvey
Hmm, when you did the write-back test was the ZIL SSD included in the write-back? What I was proposing was write-back only on the disks, and ZIL SSD with no write-back. The tests I did were: All disks write-through All disks write-back With/without SSD for ZIL All the permutations of the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-04 Thread Edward Ned Harvey
Actually, It's my experience that Sun (and other vendors) do exactly that for you when you buy their parts - at least for rotating drives, I have no experience with SSD's. The Sun disk label shipped on all the drives is setup to make the drive the standard size for that sun part number.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Casper . Dik
The only way to guarantee consistency in the snapshot is to always (regardless of ZIL enabled/disabled) give priority for sync writes to get into the TXG before async writes. If the OS does give priority for sync writes going into TXG's before async writes (even with ZIL disabled), then after

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Neil Perrin
On 04/02/10 08:24, Edward Ned Harvey wrote: The purpose of the ZIL is to act like a fast log for synchronous writes. It allows the system to quickly confirm a synchronous write request with the minimum amount of work. Bob and Casper and some others clearly know a lot here. But I'm

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Jeroen Roodhart
Hi Al, Have you tried the DDRdrive from Christopher George cgeo...@ddrdrive.com? Looks to me like a much better fit for your application than the F20? It would not hurt to check it out. Looks to me like you need a product with low *latency* - and a RAM based cache would be a much better

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Christopher George
Well, I did look at it but at that time there was no Solaris support yet. Right now it seems there is only a beta driver? Correct, we just completed functional validation of the OpenSolaris driver. Our focus has now turned to performance tuning and benchmarking. We expect to formally

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Ragnar Sundblad
On 1 apr 2010, at 06.15, Stuart Anderson wrote: Assuming you are also using a PCI LSI HBA from Sun that is managed with a utility called /opt/StorMan/arcconf and reports itself as the amazingly informative model number Sun STK RAID INT what worked for me was to run, arcconf delete (to delete

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Ragnar Sundblad
On 2 apr 2010, at 22.47, Neil Perrin wrote: Suppose there is an application which sometimes does sync writes, and sometimes async writes. In fact, to make it easier, suppose two processes open two files, one of which always writes asynchronously, and one of which always writes

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Richard Elling
On Apr 3, 2010, at 5:47 PM, Ragnar Sundblad wrote: On 2 apr 2010, at 22.47, Neil Perrin wrote: Suppose there is an application which sometimes does sync writes, and sometimes async writes. In fact, to make it easier, suppose two processes open two files, one of which always writes

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Casper . Dik
On 01/04/2010 20:58, Jeroen Roodhart wrote: I'm happy to see that it is now the default and I hope this will cause the Linux NFS client implementation to be faster for conforming NFS servers. Interesting thing is that apparently defaults on Solaris an Linux are chosen such that one

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Roch
Robert Milkowski writes: On 01/04/2010 20:58, Jeroen Roodhart wrote: I'm happy to see that it is now the default and I hope this will cause the Linux NFS client implementation to be faster for conforming NFS servers. Interesting thing is that apparently defaults on Solaris

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
Seriously, all disks configured WriteThrough (spindle and SSD disks alike) using the dedicated ZIL SSD device, very noticeably faster than enabling the WriteBack. What do you get with both SSD ZIL and WriteBack disks enabled? I mean if you have both why not use both? Then both

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
I know it is way after the fact, but I find it best to coerce each drive down to the whole GB boundary using format (create Solaris partition just up to the boundary). Then if you ever get a drive a little smaller it still should fit. It seems like it should be unnecessary. It seems like

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Roch
When we use one vmod, both machines are finished in about 6min45, zilstat maxes out at about 4200 IOPS. Using four vmods it takes about 6min55, zilstat maxes out at 2200 IOPS. Can you try 4 concurrent tar to four different ZFS filesystems (same pool). -r

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
http://nfs.sourceforge.net/ I think B4 is the answer to Casper's question: We were talking about ZFS, and under what circumstances data is flushed to disk, in what way sync and async writes are handled by the OS, and what happens if you disable ZIL and lose power to your system. We were

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
I am envisioning a database, which issues a small sync write, followed by a larger async write. Since the sync write is small, the OS would prefer to defer the write and aggregate into a larger block. So the possibility of the later async write being committed to disk before the older

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
hello i have had this problem this week. our zil ssd died (apt slc ssd 16gb). because we had no spare drive in stock, we ignored it. then we decided to update our nexenta 3 alpha to beta, exported the pool and made a fresh install to have a clean system and tried to import the pool. we

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
ZFS recovers to a crash-consistent state, even without the slog, meaning it recovers to some state through which the filesystem passed in the seconds leading up to the crash. This isn't what UFS or XFS do. The on-disk log (slog or otherwise), if I understand right, can actually make the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
If you have zpool less than version 19 (when ability to remove log device was introduced) and you have a non-mirrored log device that failed, you had better treat the situation as an emergency. Instead, do man zpool and look for zpool remove. If it says supports removing log devices

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Casper . Dik
http://nfs.sourceforge.net/ I think B4 is the answer to Casper's question: We were talking about ZFS, and under what circumstances data is flushed to disk, in what way sync and async writes are handled by the OS, and what happens if you disable ZIL and lose power to your system. We were

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Casper . Dik
So you're saying that while the OS is building txg's to write to disk, the OS will never reorder the sequence in which individual write operations get ordered into the txg's. That is, an application performing a small sync write, followed by a large async write, will never have the second

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
Dude, don't be so arrogant. Acting like you know what I'm talking about better than I do. Face it that you have something to learn here. You may say that, but then you post this: Acknowledged. I read something arrogant, and I replied even more arrogant. That was dumb of me.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
Only a broken application uses sync writes sometimes, and async writes at other times. Suppose there is a virtual machine, with virtual processes inside it. Some virtual process issues a sync write to the virtual OS, meanwhile another virtual process issues an async write. Then the virtual OS

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
The purpose of the ZIL is to act like a fast log for synchronous writes. It allows the system to quickly confirm a synchronous write request with the minimum amount of work. Bob and Casper and some others clearly know a lot here. But I'm hearing conflicting information, and don't know what

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Casper . Dik
Questions to answer would be: Is a ZIL log device used only by sync() and fsync() system calls? Is it ever used to accelerate async writes? There are quite a few of sync writes, specifically when you mix in the NFS server. Suppose there is an application which sometimes does sync writes,

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Kyle McDonald
On 4/2/2010 8:08 AM, Edward Ned Harvey wrote: I know it is way after the fact, but I find it best to coerce each drive down to the whole GB boundary using format (create Solaris partition just up to the boundary). Then if you ever get a drive a little smaller it still should fit. It

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Mattias Pantzare
On Fri, Apr 2, 2010 at 16:24, Edward Ned Harvey solar...@nedharvey.com wrote: The purpose of the ZIL is to act like a fast log for synchronous writes.  It allows the system to quickly confirm a synchronous write request with the minimum amount of work. Bob and Casper and some others clearly

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Bob Friesenhahn
On Fri, 2 Apr 2010, Edward Ned Harvey wrote: So you're saying that while the OS is building txg's to write to disk, the OS will never reorder the sequence in which individual write operations get ordered into the txg's. That is, an application performing a small sync write, followed by a large

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Bob Friesenhahn
On Fri, 2 Apr 2010, Edward Ned Harvey wrote: were taking place at the same time. That is, if two processes both complete a write operation at the same time, one in sync mode and the other in async mode, then it is guaranteed the data on disk will never have the async data committed before the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Stuart Anderson
On Apr 2, 2010, at 5:08 AM, Edward Ned Harvey wrote: I know it is way after the fact, but I find it best to coerce each drive down to the whole GB boundary using format (create Solaris partition just up to the boundary). Then if you ever get a drive a little smaller it still should fit.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Ross Walker
On Fri, Apr 2, 2010 at 8:03 AM, Edward Ned Harvey solar...@nedharvey.com wrote: Seriously, all disks configured WriteThrough (spindle and SSD disks alike) using the dedicated ZIL SSD device, very noticeably faster than enabling the WriteBack. What do you get with both SSD ZIL and

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Robert Milkowski
On 02/04/2010 16:04, casper@sun.com wrote: sync() is actually *async* and returning from sync() says nothing about to clarify - in case of ZFS sync() is actually synchronous. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Tirso Alonso
If my new replacement SSD with identical part number and firmware is 0.001 Gb smaller than the original and hence unable to mirror, what's to prevent the same thing from happening to one of my 1TB spindle disk mirrors? There is a standard for sizes that many manufatures use (IDEMA LBA1-02):

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Miles Nordin
enh == Edward Ned Harvey solar...@nedharvey.com writes: enh If you have zpool less than version 19 (when ability to remove enh log device was introduced) and you have a non-mirrored log enh device that failed, you had better treat the situation as an enh emergency. Ed the log device

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Tim Cook
On Fri, Apr 2, 2010 at 10:08 AM, Kyle McDonald kmcdon...@egenera.comwrote: On 4/2/2010 8:08 AM, Edward Ned Harvey wrote: I know it is way after the fact, but I find it best to coerce each drive down to the whole GB boundary using format (create Solaris partition just up to the boundary).

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Eric D. Mudama
On Fri, Apr 2 at 11:14, Tirso Alonso wrote: If my new replacement SSD with identical part number and firmware is 0.001 Gb smaller than the original and hence unable to mirror, what's to prevent the same thing from happening to one of my 1TB spindle disk mirrors? There is a standard for sizes

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Al Hopper
Hi Jeroen, Have you tried the DDRdrive from Christopher George cgeo...@ddrdrive.com? Looks to me like a much better fit for your application than the F20? It would not hurt to check it out. Looks to me like you need a product with low *latency* - and a RAM based cache would be a much better

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
If you disable the ZIL, the filesystem still stays correct in RAM, and the only way you lose any data such as you've described, is to have an ungraceful power down or reboot. The advice I would give is: Do zfs autosnapshots frequently (say ... every 5 minutes, keeping the most recent 2 hours of

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Edward Ned Harvey
If you disable the ZIL, the filesystem still stays correct in RAM, and the only way you lose any data such as you've described, is to have an ungraceful power down or reboot. The advice I would give is: Do zfs autosnapshots frequently (say ... every 5 minutes, keeping the most recent 2

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Edward Ned Harvey
Can you elaborate? Just today, we got the replacement drive that has precisely the right version of firmware and everything. Still, when we plugged in that drive, and create simple volume in the storagetek raid utility, the new drive is 0.001 Gb smaller than the old drive. I'm still

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
If you have an ungraceful shutdown in the middle of writing stuff, while the ZIL is disabled, then you have corrupt data. Could be files that are partially written. Could be wrong permissions or attributes on files. Could be missing files or directories. Or some other problem. Some changes

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Edward Ned Harvey
If you have an ungraceful shutdown in the middle of writing stuff, while the ZIL is disabled, then you have corrupt data. Could be files that are partially written. Could be wrong permissions or attributes on files. Could be missing files or directories. Or some other problem. Some

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Edward Ned Harvey
This approach does not solve the problem. When you do a snapshot, the txg is committed. If you wish to reduce the exposure to loss of sync data and run with ZIL disabled, then you can change the txg commit interval -- however changing the txg commit interval will not eliminate the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Edward Ned Harvey
Is that what sync means in Linux? A sync write is one in which the application blocks until the OS acks that the write has been committed to disk. An async write is given to the OS, and the OS is permitted to buffer the write to disk at its own discretion. Meaning the async write function

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
Dude, don't be so arrogant. Acting like you know what I'm talking about better than I do. Face it that you have something to learn here. You may say that, but then you post this: Why do you think that a Snapshot has a better quality than the last snapshot available? If you rollback to a

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
Is that what sync means in Linux? A sync write is one in which the application blocks until the OS acks that the write has been committed to disk. An async write is given to the OS, and the OS is permitted to buffer the write to disk at its own discretion. Meaning the async write function

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
This approach does not solve the problem. When you do a snapshot, the txg is committed. If you wish to reduce the exposure to loss of sync data and run with ZIL disabled, then you can change the txg commit interval -- however changing the txg commit interval will not eliminate the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Ross Walker
On Mar 31, 2010, at 11:51 PM, Edward Ned Harvey solar...@nedharvey.com wrote: A MegaRAID card with write-back cache? It should also be cheaper than the F20. I haven't posted results yet, but I just finished a few weeks of extensive benchmarking various configurations. I can say this:

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Ross Walker
On Mar 31, 2010, at 11:58 PM, Edward Ned Harvey solar...@nedharvey.com wrote: We ran into something similar with these drives in an X4170 that turned out to be an issue of the preconfigured logical volumes on the drives. Once we made sure all of our Sun PCI HBAs where running the exact

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Ross Walker
On Apr 1, 2010, at 8:42 AM, casper@sun.com wrote: Is that what sync means in Linux? A sync write is one in which the application blocks until the OS acks that the write has been committed to disk. An async write is given to the OS, and the OS is permitted to buffer the write to

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Darren J Moffat
On 01/04/2010 14:49, Ross Walker wrote: We're talking about the sync for NFS exports in Linux; what do they mean with sync NFS exports? See section A1 in the FAQ: http://nfs.sourceforge.net/ I think B4 is the answer to Casper's question: BEGIN QUOTE Linux servers (although not

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Ross Walker
On Thu, Apr 1, 2010 at 10:03 AM, Darren J Moffat darr...@opensolaris.org wrote: On 01/04/2010 14:49, Ross Walker wrote: We're talking about the sync for NFS exports in Linux; what do they mean with sync NFS exports? See section A1 in the FAQ: http://nfs.sourceforge.net/ I think B4 is

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Bob Friesenhahn
On Thu, 1 Apr 2010, Edward Ned Harvey wrote: If I'm wrong about this, please explain. I am envisioning a database, which issues a small sync write, followed by a larger async write. Since the sync write is small, the OS would prefer to defer the write and aggregate into a larger block. So

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Robert Milkowski
On 01/04/2010 13:01, Edward Ned Harvey wrote: Is that what sync means in Linux? A sync write is one in which the application blocks until the OS acks that the write has been committed to disk. An async write is given to the OS, and the OS is permitted to buffer the write to disk at its

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
On 01/04/2010 13:01, Edward Ned Harvey wrote: Is that what sync means in Linux? A sync write is one in which the application blocks until the OS acks that the write has been committed to disk. An async write is given to the OS, and the OS is permitted to buffer the write to disk at

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Bob Friesenhahn
On Thu, 1 Apr 2010, Edward Ned Harvey wrote: Dude, don't be so arrogant. Acting like you know what I'm talking about better than I do. Face it that you have something to learn here. Geez! Yes, all the transactions in a transaction group are either committed entirely to disk, or not at

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
It does seem like rollback to a snapshot does help here (to assure that sync async data is consistent), but it certainly does not help any NFS clients. Only a broken application uses sync writes sometimes, and async writes at other times. But doesn't that snapshot possibly have the same

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Bob Friesenhahn
On Thu, 1 Apr 2010, casper@sun.com wrote: It does seem like rollback to a snapshot does help here (to assure that sync async data is consistent), but it certainly does not help any NFS clients. Only a broken application uses sync writes sometimes, and async writes at other times. But

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Günther
hello i have had this problem this week. our zil ssd died (apt slc ssd 16gb). because we had no spare drive in stock, we ignored it. then we decided to update our nexenta 3 alpha to beta, exported the pool and made a fresh install to have a clean system and tried to import the pool. we only

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Jeroen Roodhart
Hi Casper, :-) Leuk te zien dat je straal nog steeds even ver komt :-) I'm happy to see that it is now the default and I hope this will cause the Linux NFS client implementation to be faster for conforming NFS servers. Interesting thing is that apparently defaults on Solaris an Linux are

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Carson Gaspar
Jeroen Roodhart wrote: The thread was started to get insight in behaviour of the F20 as ZIL. _My_ particular interest would be to be able to answer why perfomance doesn't seem to scale up when adding vmod-s... My best guess would be latency. If you are latency bound, adding additional

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Jeroen Roodhart
It doesn't have to be F20. You could use the Intel X25 for example. The mlc-based disks are bound to be too slow (we tested with an OCZ Vertex Turbo). So you're stuck with the X25-E (which Sun stopped supporting for some reason). I believe most normal SSDs do have some sort of cache and

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Miles Nordin
enh == Edward Ned Harvey solar...@nedharvey.com writes: enh Dude, don't be so arrogant. Acting like you know what I'm enh talking about better than I do. Face it that you have enh something to learn here. funny! AIUI you are wrong and Casper is right. ZFS recovers to a

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Robert Milkowski
On 01/04/2010 20:58, Jeroen Roodhart wrote: I'm happy to see that it is now the default and I hope this will cause the Linux NFS client implementation to be faster for conforming NFS servers. Interesting thing is that apparently defaults on Solaris an Linux are chosen such that one

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Jeroen Roodhart
Oh, one more comment. If you don't mirror your ZIL, and your unmirrored SSD goes bad, you lose your whole pool. Or at least suffer data corruption. Hmmm, I thought that in that case ZFS reverts to the regular on disks ZIL? With kind regards, Jeroen -- This message posted from opensolaris.org

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Jeroen Roodhart
The write cache is _not_ being disabled. The write cache is being marked as non-volatile. Of course you're right :) Please filter my postings with a sed 's/write cache/write cache flush/g' ;) BTW, why is a Sun/Oracle branded product not properly respecting the NV bit in the cache flush command?

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Karsten Weiss
I stand corrected. You don't lose your pool. You don't have corrupted filesystem. But you lose whatever writes were not yet completed, so if those writes happen to be things like database transactions, you could have corrupted databases or files, or missing files if you were creating them

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Karsten Weiss
Hi Adam, Very interesting data. Your test is inherently single-threaded so I'm not surprised that the benefits aren't more impressive -- the flash modules on the F20 card are optimized more for concurrent IOPS than single-threaded latency. Thanks for your reply. I'll probably test the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Brent Jones
On Wed, Mar 31, 2010 at 1:00 AM, Karsten Weiss k.we...@science-computing.de wrote: Hi Adam, Very interesting data. Your test is inherently single-threaded so I'm not surprised that the benefits aren't more impressive -- the flash modules on the F20 card are optimized more for concurrent

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Arne Jansen
Brent Jones wrote: I don't think you'll find the performance you paid for with ZFS and Solaris at this time. I've been trying to more than a year, and watching dozens, if not hundreds of threads. Getting half-ways decent performance from NFS and ZFS is impossible unless you disable the ZIL.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Karsten Weiss
Nobody knows any way for me to remove my unmirrored log device. Nobody knows any way for me to add a mirror to it (until Since snv_125 you can remove log devices. See http://bugs.opensolaris.org/view_bug.do?bug_id=6574286 I've used this all the time during my testing and was able to remove

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Robert Milkowski
On Wed, Mar 31, 2010 at 1:00 AM, Karsten Weiss Use something other than Open/Solaris with ZFS as an NFS server? :) I don't think you'll find the performance you paid for with ZFS and Solaris at this time. I've been trying to more than a year, and watching dozens, if not hundreds of

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Robert Milkowski
Just to make sure you know ... if you disable the ZIL altogether, and you have a power interruption, failed cpu, or kernel halt, then you're likely to have a corrupt unusable zpool, or at least data corruption. If that is indeed acceptable to

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Robert Milkowski
standard ZIL: 7m40s (ZFS default) 1x SSD ZIL: 4m07s (Flash Accelerator F20) 2x SSD ZIL: 2m42s (Flash Accelerator F20) 2x SSD mirrored ZIL: 3m59s (Flash Accelerator F20) 3x SSD ZIL: 2m47s (Flash Accelerator F20) 4x SSD ZIL:

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Karsten Weiss
Hi Jeroen, Adam! link. Switched write caching off with the following addition to the /kernel/drv/sd.conf file (Karsten: if you didn't do this already, you _really_ want to :) Okay, I bite! :) format-inquiry on the F20 FMods disks returns: # Vendor: ATA # Product: MARVELL SD88SA02 So I

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Edward Ned Harvey
Use something other than Open/Solaris with ZFS as an NFS server? :) I don't think you'll find the performance you paid for with ZFS and Solaris at this time. I've been trying to more than a year, and watching dozens, if not hundreds of threads. Getting half-ways decent performance from NFS

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Jeroen Roodhart
Hi Karsten, But is this mode of operation *really* safe? As far as I can tell it is. -The F20 uses some form of power backup that should provide power to the interface card long enough to get the cache onto solid state in case of power failure. -Recollecting from earlier threads here; in

  1   2   >