on 13/12/2012 19:46 olivier said the following: > Thanks. I'll be sure to follow your suggestions next time this happens. > > I have a naive question/suggestion though. I see from browsing past > discussions on > ZFS problems that it has been suggested a number of times that problems that > appear to originate in ZFS in fact come from lower layers; in particular > because > of driver bugs or disks in the process of failing. It seems that it can take > a lot > of time to troubleshoot such problems. I accept that ZFS behavior correctly > leaves > dealing with timeouts to lower layers, but it seems to me that the ZFS layer > would > be a great place to warn the user about issues and provide some information to > troubleshoot them. > > For example, if some I/O requests get lost because of a buggy driver, the > driver > itself might not be the best place to identify those lost requests. But > perhaps we > could have a compile time option in ZFS code that spits out a warning if it > gets > stuck waiting for a particular request to come back for more than say 10 > seconds, > and identifies the problematic disk? I'm sure there would be cases where these > warnings would be unwarranted, and I imagine that changes in the code to > provide > such warnings would impact performance; so one certainly would not want that > code > active by default. But someone in my position could certainly recompile the > kernel > with a ZFS debugging option turned on to figure out the problem. > > I understand that ZFS code comes from upstream, and that you guys probably > want to > keep FreeBSD-specific changes minimal. If that's a big problem, even just a > patch > provided "as such" that does not make it into the FreeBSD code base might be > extremely useful. I wish I could help write something like that, but I know > very > little about the kernel or ZFS. I would certainly be willing to help with > testing.
Google for "zfs deadman". This is already committed upstream and I think that it is imported into FreeBSD, but I am not sure... Maybe it's imported just into the vendor area and is not merged yet. So, when enabled this logic would panic a system as a way of letting know that something is wrong. You can read in the links why panic was selected for this job. And speaking FreeBSD-centric - I think that our CAM layer would be a perfect place to detect such issues in non-ZFS-specific way. -- Andriy Gapon _______________________________________________ [email protected] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[email protected]"
