Hi Alan, On Wednesday 22 August 2007 06:03:51 am Alan Brown wrote: > On Tue, 21 Aug 2007, Ivan Adzhubey wrote: > > Of course it is not running, that's what the whole story is about. I have > > a bunch of desktop/laptop clients configured that can be shut > > down/disconnected/not around at random and are well beyond my control. > > Add a RunBeforeJob to ping the hosts and only proceed if they're > available. > > Something like "If {ping -i10 -c3}" is usually enough. > > Machines which don't respond to ping will have to have some other form of > "Are you there?" done. The point is that it's an error condition you can > test for and abort on before the job itself starts.
I am well aware of this solution but as I wrote before I consider it a hack. Let me summarize why. First of all, I am positively certain that any network connectivity issues have to be a core part of any network enabled software. I can't imagine why Bacula developers deny this obvious fact. After all, Bacula is a network backup system, so why we should hack an external script to check the most basic client connectivity? Next, there are problems with RunBeforeJob solution too: 1. Ping is useless on modern networks. Windows machines have ICMP echo reply disabled, and most other desktop boxes tend to come with some sort of firewall enabled by default these days, which is most often configured to reject ICMP requests. Now this is not a big deal, I use telnet to port 9102 instead but see below. 2. There is no way to pass host address/FQDN to RunBeforeJob script from a director. Only client's name is available. I have to parse bacula-dir.conf inside my RunBeforeJob script to extract the matching address. This is another hack and a total waste of time, even though it's just 4 lines in Perl. (Of course, there are other solutions like for instance, always including IP address as part of client's name in bacula-dir.conf, but they all have their cons.) 3. Even though RunBeforeJob script can be used to terminate job on the director, the storage daemon will not be properly notified of this failure. Moreover, storage resources for the job will be claimed by director *before* running RunBeforeJob script and hence these claimed resources (tapes, pools) will keep blocking any further jobs on this SD until it times out. They will also receive a status of "Other" eventually, not "Error", which is totally misleading. I have SD timeout set to 10 minutes so it is not a big problem if one client fails. What if a hundred of them fail simultaneously? This already happened once when a whole wing of our building lost connection due to failed router. Oh, forgot another annoying bug: when SD is blocked due to DIR failing to communicate job termination, SD will also issue "Intervention needed..." message, which is misleading. In fact, after eventual timeout, resources will be unblocked automatically, SD will recover without operator's assistance and will proceed with other jobs in queue. Bottom line, this is a serious design flaw that leads to a row of extremely annoying bugs that are difficult to control and workaround. It is just a shame such a wonderful software suffers from negligence. Should take no more than an hour to fix while this has been first reported back in 2003! --Ivan The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users