Hi Alan,

On Wednesday 22 August 2007 06:03:51 am Alan Brown wrote:
> On Tue, 21 Aug 2007, Ivan Adzhubey wrote:
> > Of course it is not running, that's what the whole story is about. I have
> > a bunch of desktop/laptop clients configured that can be shut
> > down/disconnected/not around at random and are well beyond my control.
>
> Add a RunBeforeJob to ping the hosts and only proceed if they're
> available.
>
> Something like "If {ping -i10 -c3}" is usually enough.
>
> Machines which don't respond to ping will have to have some other form of
> "Are you there?" done. The point is that it's an error condition you can
> test for and abort on before the job itself starts.

I am well aware of this solution but as I wrote before I consider it a hack. 
Let me summarize why. First of all, I am positively certain that any network 
connectivity issues have to be a core part of any network enabled software. I 
can't imagine why Bacula developers deny this obvious fact. After all, Bacula 
is a network backup system, so why we should hack an external script to check 
the most basic client connectivity? Next, there are problems with 
RunBeforeJob solution too:

1. Ping is useless on modern networks. Windows machines have ICMP echo reply 
disabled, and most other desktop boxes tend to come with some sort of 
firewall enabled by default these days, which is most often configured to 
reject ICMP requests. Now this is not a big deal, I use telnet to port 9102 
instead but see below.

2. There is no way to pass host address/FQDN to RunBeforeJob script from a 
director. Only client's name is available. I have to parse bacula-dir.conf 
inside my RunBeforeJob script to extract the matching address. This is 
another hack and a total waste of time, even though it's just 4 lines in 
Perl. (Of course, there are other solutions like for instance, always 
including IP address as part of client's name in bacula-dir.conf, but they 
all have their cons.)

3. Even though RunBeforeJob script can be used to terminate job on the 
director, the storage daemon will not be properly notified of this failure. 
Moreover, storage resources for the job will be claimed by director *before* 
running RunBeforeJob script and hence these claimed resources (tapes, pools) 
will keep blocking any further jobs on this SD until it times out. They will 
also receive a status of "Other" eventually, not "Error", which is totally 
misleading. I have SD timeout set to 10 minutes so it is not a big problem if 
one client fails. What if a hundred of them fail simultaneously? This already 
happened once when a whole wing of our building lost connection due to failed 
router. Oh, forgot another annoying bug: when SD is blocked due to DIR 
failing to communicate job termination, SD will also issue "Intervention 
needed..." message, which is misleading. In fact, after eventual timeout, 
resources will be unblocked automatically, SD will recover without operator's 
assistance and will proceed with other jobs in queue.

Bottom line, this is a serious design flaw that leads to a row of extremely 
annoying bugs that are difficult to control and workaround. It is just a 
shame such a wonderful software suffers from negligence. Should take no more 
than an hour to fix while this has been first reported back in 2003!

--Ivan


The information transmitted in this electronic communication is intended only 
for the person or entity to whom it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
information in error, please contact the Compliance HelpLine at 800-856-1983 
and properly dispose of this information.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to