Thomas Roth wrote: > Thank you Thomas. > If these messages mean that robinhood just continues after the > timeout, it would be nothing to worry about, but I will try to adapt > the timeout anyhow. > Right now, however, it seems the scan is really stuck: since days, > rbh-report -i tells me about 612 TB in the filesystem, but lfs df says > we have 787 TB ;-) A couple of such messages would not be a big deal, but 100s/day during several days is not normal... I suspect a problem on timeout handling in robinhood, that leads to such a blocking. That's why I suggest you to avoid timeouts by increasing its value. > Btw, whenever I restart the scan, e.g. after a reconfiguration such as > for the timeout, I get the logfile full of Tips: for changing such a scalar param, you are not obliged to fully restart the daemon. "service robinhood reload" or "kill -HUP" on the process is OK. > > ListMgr | DB query failed in ListMgr_Insert line 340... > and assorted messages, which seem to indicate that the new robinhood > scan tries to put something into the DB that is already there, and > stumbles on this. Or maybe that happens when several robins are > running simultaneously. Are you running several instances for scanning the same filesystem?? > I'm not sure if it is a problem for the scan, it is, however, a > problem for the free space on /var, or wherever I point the log to ;-) > > Regards, > Thomas > > On 24.11.2010 13:20, LEIBOVICI Thomas wrote: >> Hi Thomas, >> >> We already stated this, basically after the filesystem was blocked for a >> while, or after an OSS had crashed. >> If it is stuck for too long (default timeout is 1 hour), robinhood tries >> to cancel its operation on current directory and continues with the next >> one. >> Maybe it didn't recover successfuly from this cancellation, and you >> receive those messages since that badly happened. >> >> To avoid this problem, you can increase the timeout to a very high >> value, to make sure it is never reached (e.g. xxx days). >> In that case, robinhood will remain stuck as long as its current >> operation in Lustre is blocked, >> and it will resume the current operation as soon as Lustre is back. >> >> You can change this timeout by setting the "scan_op_timeout" parameter >> in the "FS_Scan" section of config file. >> >> Alternatively, you can also keep a reasonable timeout and make robinhood >> exit when the filesystem is not responding >> by setting "exit_on_timeout = TRUE" in the same section of the config. >> So you can respawn robinhood daemon when everything is fixed. >> >> Best regards, >> Thomas LEIBOVICI >> CEA/DAM >> >> > A support request from lustre-discuss. >> > >> > >> ------------------------------------------------------------------------ >> > >> > Sujet: >> > [Lustre-discuss] robinhood error messages >> > Expéditeur: >> > Thomas Roth <[email protected]> >> > Date: >> > Tue, 23 Nov 2010 20:20:33 +0100 >> > Destinataire: >> > [email protected] >> > >> > Destinataire: >> > [email protected] >> > >> > >> > Hi all, >> > >> > we are running robinhood (v2.2.1) on our 1.8.4 cluster (basically to >> > find out where and who the big space consumers are - no purging). >> > >> > Robinhood sends me lots and lots of messages (~100/day) of the type >> > >> > > ===== FS scan is blocked (/lustre) ===== >> > > Date: 2010/11/23 20:05:22 >> > > Program: robinhood (pid 4826) >> > > Host: lxb310 >> > > Filesystem: /lustre >> > > A thread has been inactive for 3660 sec >> > > while scanning directory /lustre/.... >> > >> > This seems to indicate some trouble accessing certain directories >> on the >> > node where robinhood is running. However, this is independent of the >> > node, and at the same time we neither see any issues / slowness/ >> > connectivity problems nor get any user complaints of the like. >> > >> > So I wonder whether anybody else is using robinhood and has seen >> similar >> > messages. >> > >> > Regards, >> > Thomas >> > _______________________________________________ >> > Lustre-discuss mailing list >> > [email protected] >> > http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > >> > >> > >> ------------------------------------------------------------------------ >> > >> > >> ------------------------------------------------------------------------------ >> >> >> > Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! >> > Tap into the largest installed PC base & get more eyes on your >> game by >> > optimizing for Intel(R) Graphics Technology. Get started today >> with the >> > Intel(R) Software Partner Program. Five $500 cash prizes are up for >> grabs. >> > http://p.sf.net/sfu/intelisp-dev2dev >> > >> ------------------------------------------------------------------------ >> > >> > _______________________________________________ >> > robinhood-support mailing list >> > [email protected] >> > https://lists.sourceforge.net/lists/listinfo/robinhood-support >> > >> > >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
