:Hi, : :ever since the last time I had CRC problems on my router box, I've :developed the habit of doing a daily 'hammer -f /dev/ad4s1d show |& grep :"^B"' to see if any new errors crept up, and today I found: : :yoyodyne# hammer -f /dev/ad4s1d show |& grep "^B" :B dataoff=a00000714d120000/65536 crc=7e4f7545 :B dataoff=a000007171380000/65536 crc=616b1cc1
The question is whether it is real or not. If the filesystem is mounted live then the show command could be catching things in odd states. :Console log for the recent days is: : :Nov 7 03:15:19 <kern.crit> yoyodyne kernel: HAMMER: Warning: rebalance :caught race against propagate :... None of those are serious. Basically just debug messages that will be removed soon. The emergency page allocation for BIO is unrelated to the filesystem code. It's also actually just a warning (telling me that something is eating too many free VM pages). :So my question is: What are my next steps in order to help resolve this :issue? Is there any way to get e.g. to the names of the files affected :by this problem from the data which is output by 'hammer show'? : :So far the only thing I've done is to disable nightly hammer cleanup :because DragonFly, upon encountering a CRC error, will unfortunately :simply drop to the debugger without panicing, so this doesn't get caught :by DDB_UNATTENDED as far as I can tell (Matt, are there any plans to :change this unpleasant behavior?). And I won't be near that box until :next weekend. : :Regards, :Sascha I fixed the behavior in current. There is now a sysctl which controls whether it drops into the debugger or not (and it does not by default). Though it doesn't panic... maybe the sysctl should be modified to give it the ability to panic instead of propagating an error code up the call chain. The filesystem still drops into read-only mode if an error is encountered. What you want to do now is run 'hammer -f ... show | less -B' and search for B, as in '/^B'. less -B uses a fixed buffer so if you scroll down you basically cannot scroll back up (by much), which allows you to pipe gigabytes and gigabytes of text through it without it malloc()ing itself into oblivion. You want to try to find the problem area and get more context out of it, such as the object id. And also to determine whether the problem area is real or not. Again the filesystem has to be idle and it would be even better if it were offline entirely. -Matt Matthew Dillon <dil...@backplane.com>