Re: [zfs-discuss] Another paper

2007-02-23 Thread Gregory Shaw
On Feb 22, 2007, at 11:55 AM, Eric Schrock wrote: [ ... ] b) It is not uncommon for such successful reads of partially defective media to happen only after several retries. It is somewhat unfortunate that there is no simple way to tell the drive how many times to retry.

Re: [zfs-discuss] Another paper

2007-02-22 Thread Joerg Schilling
Richard Elling [EMAIL PROTECTED] wrote: If a disk fitness test were available to verify disk read/write and performance, future drive problems could be avoided. Some example tests: - full disk read - 8kb r/w iops - 1mb r/w iops - raw throughput Some problems can be seen by

Re: [zfs-discuss] Another paper

2007-02-22 Thread Nicolas Williams
On Wed, Feb 21, 2007 at 04:20:58PM -0800, Eric Schrock wrote: Seems like there are a two pieces you're suggesting here: 1. Some sort of background process to proactively find errors on disks in use by ZFS. This will be accomplished by a background scrubbing option, dependent on the

Re: [zfs-discuss] Another paper

2007-02-22 Thread Olaf Manczak
Eric Schrock wrote: 1. Some sort of background process to proactively find errors on disks in use by ZFS. This will be accomplished by a background scrubbing option, dependent on the block-rewriting work Matt and Mark are working on. This will allow something like zpool set

Re: [zfs-discuss] Another paper

2007-02-22 Thread Eric Schrock
On Thu, Feb 22, 2007 at 10:45:04AM -0800, Olaf Manczak wrote: Obviously, scrubbing and correcting hard errors that result in ZFS checksum errors is very beneficial. However, it won't address the case of soft errors when the disk returns correct data but observes some problems reading it.

[zfs-discuss] Another paper

2007-02-21 Thread Gregory Shaw
Below is another paper on drive failure analysis, this one won best paper at usenix: http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/ index.html What I found most interesting was the idea that drives don't fail outright most of the time. They can slow down operations,

Re: [zfs-discuss] Another paper

2007-02-21 Thread Richard Elling
Gregory Shaw wrote: Below is another paper on drive failure analysis, this one won best paper at usenix: http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html What I found most interesting was the idea that drives don't fail outright most of the time. They can slow

Re: [zfs-discuss] Another paper

2007-02-21 Thread Gregory Shaw
On Feb 21, 2007, at 4:59 PM, Richard Elling wrote: With this behavior in mind, I had an idea for a new feature in ZFS: If a disk fitness test were available to verify disk read/write and performance, future drive problems could be avoided. Some example tests: - full disk read - 8kb r/w iops

Re: [zfs-discuss] Another paper

2007-02-21 Thread Eric Schrock
On Wed, Feb 21, 2007 at 03:35:06PM -0700, Gregory Shaw wrote: Below is another paper on drive failure analysis, this one won best paper at usenix: http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/ index.html What I found most interesting was the idea that drives don't

Re: [zfs-discuss] Another paper

2007-02-21 Thread Gregory Shaw
On Feb 21, 2007, at 5:20 PM, Eric Schrock wrote: On Wed, Feb 21, 2007 at 03:35:06PM -0700, Gregory Shaw wrote: Below is another paper on drive failure analysis, this one won best paper at usenix: http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/ index.html What I found most

Re: [zfs-discuss] Another paper

2007-02-21 Thread Nicholas Lee
On 2/22/07, Gregory Shaw [EMAIL PROTECTED] wrote: I was thinking of something similar to a scrub. An ongoing process seemed too intrusive. I'd envisioned a cron job similar to a scrub (or defrag) that could be run periodically to show any differences between disk performance over time.

Re: [zfs-discuss] Another paper

2007-02-21 Thread TJ Easter
All, I think dtrace could be a viable option here. crond to run a dtrace script on a regular basis that times a series of reads and then provides that info to Cacti or rrdtool. It's not quite the one-size-fits-all that the OP was looking for, but if you want trends, this should get 'em.

Re: [zfs-discuss] Another paper

2007-02-21 Thread Wee Yeh Tan
Correct me if I'm wrong but fma seems like a more appropriate tool to track disk errors. -- Just me, Wire ... On 2/22/07, TJ Easter [EMAIL PROTECTED] wrote: All, I think dtrace could be a viable option here. crond to run a dtrace script on a regular basis that times a series of reads and