As near as I can tell it has always worked this way.

-----Original Message-----
From: Marc Brueckner [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 27, 2006 12:22 AM
To: Robert Nelson
Cc: bacula-users@lists.sourceforge.net; 'Knischka'; 'Holger Luedecke'
Subject: Re: [Bacula-users] Different timeouts in different subnets

Robert Nelson schrieb:
> This is due to the algorithm used by Bacula to do connect timeouts.  It
> isn't really a timeout, it is really a retry count.  If you take the
connect
> timeout in seconds and divide it by 10 you get the number of retries.  It
> doesn't account for the time spent in the connect call.  If the connect
took
> zero amount of time to fail, the two would be the same thing.  To make
> matters worse, the connect call takes a different amount of time to fail
> depending on whether or not a switch is involved.
>   
Hi Robert,
thank you very much for your help. Your explanation made the things 
clearer for me.
Do you know , if the algorithm you described has been changed in 1.38 ?
I think, I first observed this effect on 1.38, but I don't know for sure .
It dosen't matter now.

I will shorten the "timeout" value to 30 seconds . That should decrease 
my timeout to approx. 10 minutes.
and I can live with that.

Thank you once again and greetings
Marc



> So in your case, 5 minutes is equal to 300 seconds, divided by 10 equals
30.
> So you will get 30 retries.  
>
> Now, on the same subnet, it takes 6 minutes and 36 seconds to do 30
retries.
> So it takes 1 minute and 36 seconds for 30 calls to connect to fail or
> roughly 3 seconds per try.
>
> On different subnets, it takes 1 hour 39 minutes and 31 seconds or 189
> seconds or roughly 3 minutes per try.
>
> The reason for the differences is probably caching on the switch.  I
suspect
> that in the same subnet case the arp is failing (so the IP address can't
be
> converted to an Ethernet address), in the other case the switch is
> responding to the arp and a higher level (and longer timeout) is coming
into
> play, probably the TCP connect timer.
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Marc
> Brückner
> Sent: Thursday, October 26, 2006 5:45 AM
> To: bacula-users@lists.sourceforge.net
> Cc: Knischka; Holger Luedecke
> Subject: [Bacula-users] Different timeouts in different subnets
>
> Hi @ all Bacula users,
>
> I am using Bacula for several year now and I am really satisfied with it.
> But now I have a strange Problem. I am not sure but I think it first 
> occurred since I updated from
> version 1.36 to 1.38 . Now I am running 1.38.11
> My Bacula has to backup several WinXP clients over night.
> When the client runs, there is no problem and the backup is done properly.
> But if the users switch off their clients ( what happen often, 
> unfortunately ) the duration of the timeout depends on the IP-Subnet the 
> client is in .
> I have the following Timeout settings in the bacula-dir.conf
>
>  FD Connect Timeout = 5 minutes
>  SD Connect Timeout = 5 minutes
>
> If the Client is in the same IP-Subnet as the Bacula-director, the 
> director tells:
>
> 24-Oct 08:43 Bacula-dir: Start Backup JobId 3040,
> Job=StudentA2190_A.2006-10-23_19.40.53
> 24-Oct 08:44 Bacula-dir: StudentA2190_A.2006-10-23_19.40.53 Warning:
> bnet.c:853 Could not connect to File daemon on 192.168.10.67:9102. ERR=No
> route to host
> Retrying ...
> 24-Oct 08:50 Bacula-dir: StudentA2190_A.2006-10-23_19.40.53 Fatal error:
> bnet.c:859 Unable to connect to File daemon on 192.168.10.67:9102. ERR=No
> route to host
> 24-Oct 08:50 Bacula-dir: StudentA2190_A.2006-10-23_19.40.53 Error: Bacula
> 1.38.11 (28Jun06): 24-Oct-2006 08:50:29
>  
>  ...
>
>   Scheduled time:         23-Oct-2006 19:40:52
>   Start time:             24-Oct-2006 08:43:53
>   End time:               24-Oct-2006 08:50:29
>   Elapsed time:           6 mins 36 secs
>   
>
> Timeout after 6 and a half minutes,ERR=No route to host ; thats OK.
> But if the Client resides in a different IP-Subnet is says:
>
> 25-Oct 02:39 Bacula-dir: Start Backup JobId 3070,
> Job=StudentA1080_A.2006-10-24_18.00.11
> 25-Oct 02:45 Bacula-dir: StudentA1080_A.2006-10-24_18.00.11 Warning:
> bnet.c:853 Could not connect to File daemon on 192.168.30.33:9102.
> ERR=Connection timed out
> Retrying ...
> 25-Oct 04:18 Bacula-dir: StudentA1080_A.2006-10-24_18.00.11 Fatal error:
> bnet.c:859 Unable to connect to File daemon on 192.168.30.33:9102.
> ERR=Connection timed out
> 25-Oct 04:18 Bacula-dir: StudentA1080_A.2006-10-24_18.00.11 Error: Bacula
> 1.38.11 (28Jun06): 25-Oct-2006 04:18:47
>
> ...
>
>   Scheduled time:         24-Oct-2006 18:00:10
>   Start time:             25-Oct-2006 02:39:16
>   End time:               25-Oct-2006 04:18:47
>   Elapsed time:           1 hour 39 mins 31 secs
>
> Timeout after 1 hour and 40 minutes, ERR=Connection timed out; thats a 
> little bit long.
>
> I have observed many log entries and its always the same: same subnet => 
> 6 m different subnet =>1:40 h.
> There is no packet filtering between the subnets.
>
> Has anyone experienced an behavior like that? Has anyone a hint for me 
> how to shorten this 1:40 h timeout.
>
> Thank you for your help.
>
> Marc
>
>   




-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to