on 13/12/2012 19:46 olivier said the following:
> Thanks. I'll be sure to follow your suggestions next time this happens.
> 
> I have a naive question/suggestion though. I see from browsing past 
> discussions on
> ZFS problems that it has been suggested a number of times that problems that
> appear to originate in ZFS in fact come from lower layers; in particular 
> because
> of driver bugs or disks in the process of failing. It seems that it can take 
> a lot
> of time to troubleshoot such problems. I accept that ZFS behavior correctly 
> leaves
> dealing with timeouts to lower layers, but it seems to me that the ZFS layer 
> would
> be a great place to warn the user about issues and provide some information to
> troubleshoot them.
> 
> For example, if some I/O requests get lost because of a buggy driver, the 
> driver
> itself might not be the best place to identify those lost requests. But 
> perhaps we
> could have a compile time option in ZFS code that spits out a warning if it 
> gets
> stuck waiting for a particular request to come back for more than say 10 
> seconds,
> and identifies the problematic disk? I'm sure there would be cases where these
> warnings would be unwarranted, and I imagine that changes in the code to 
> provide
> such warnings would impact performance; so one certainly would not want that 
> code
> active by default. But someone in my position could certainly recompile the 
> kernel
> with a ZFS debugging option turned on to figure out the problem.
> 
> I understand that ZFS code comes from upstream, and that you guys probably 
> want to
> keep FreeBSD-specific changes minimal. If that's a big problem, even just a 
> patch
> provided "as such" that does not make it into the FreeBSD code base might be
> extremely useful. I wish I could help write something like that, but I know 
> very
> little about the kernel or ZFS. I would certainly be willing to help with 
> testing.

Google for "zfs deadman".  This is already committed upstream and I think that 
it
is imported into FreeBSD, but I am not sure...  Maybe it's imported just into 
the
vendor area and is not merged yet.
So, when enabled this logic would panic a system as a way of letting know that
something is wrong.  You can read in the links why panic was selected for this 
job.

And speaking FreeBSD-centric - I think that our CAM layer would be a perfect 
place
to detect such issues in non-ZFS-specific way.

-- 
Andriy Gapon
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Reply via email to