On 28-nov-2006, at 10:05, Dahlgren Mattias wrote:

> Hello everyone.
>
> Im trying to set up bacula to do the backup of the about 12 FreeBSD
> webservers we have.
>
> I got it working on all but 2 servers, on these servers i keep
> continuosly getting errors that the operation times out. The strange
> thing is that it seems to ALWAYS occur after almost the exact same
> time on both servers. That time is: 2 hours 10 mins 10 secs. The secs
> can vary between 10-14 but its definitely the same time.
>
> I'v read some other posts here about similar problems but nothing that
> exactly seems to match our issue.
>
> I have tried setting the heartbeat interval in the SD resource to 15
> seconds as i saw mentioned in another post which didnt help. I tried
> setting it in the Client resource aswell as suggested in the Bacula
> manual. However this causes Bacula-dir to refuse to start saying there
> is a syntax error in the config file and pointing to this exact line
> in the client resource.
>
> Basically im lost and i really need to get this operational, is there
> anyone who has any ideas? I imagine it could be the network somehow
> timing out since its happening after the exact same elapsed time on
> both servers but i cant think of where to change this time out.
>
> Here is a cut from my log file with regards to this issue:
>
> 23-Nov 01:47 xxxx-dir: No prior Full backup Job record found.
> 23-Nov 01:47 xxxx-dir: No prior or suitable Full backup found. Doing
> FULL backup.
> 23-Nov 01:47 xxxx-dir: Start Backup JobId 1046, Job=xxxx. 
> 2006-11-23_00.30.01
> 23-Nov 01:47 xxx-sd: Volume "xxxxFull-0002" previously written, moving
> to end of data.
> 23-Nov 03:57 xxxx-dir: xxxx.2006-11-23_00.30.01 Fatal error: Network
> error with FD during Backup: ERR=Operation timed out
> 23-Nov 03:57 xxxx-dir: obelix.2006-11-23_00.30.01 Fatal error: No Job
> status returned from FD.
> 23-Nov 03:57 xxxx-dir: obelix.2006-11-23_00.30.01 Error: Bacula
> 1.38.11 (28Jun06): 23-Nov-2006 03:57:43
>   JobId:                  1046
>   Job:                    xxxx.2006-11-23_00.30.01
>   Backup Level:           Full (upgraded from Incremental)
>   Client:                 "xxxx-fd" i386-portbld-freebsd6.1,freebsd, 
> 6.1-STABLE
>   FileSet:                "xxxx Full FileSet" 2006-11-21 17:28:05
>   Pool:                   "xxxx-Full-Pool"
>   Storage:                "File3"
>   Scheduled time:         23-Nov-2006 00:30:00
>   Start time:             23-Nov-2006 01:47:33
>   End time:               23-Nov-2006 03:57:43
>   Elapsed time:           2 hours 10 mins 10 secs
>   Priority:               10
>   FD Files Written:       0
>   SD Files Written:       0
>   FD Bytes Written:       0 (0 B)
>   SD Bytes Written:       0 (0 B)
>   Rate:                   0.0 KB/s
>   Software Compression:   None
>   Volume name(s):         xxxxFull-0002
>   Volume Session Id:      6
>   Volume Session Time:    1164209750
>   Last Volume Bytes:      31,997,951,399 (31.99 GB)
>   Non-fatal FD errors:    0
>   SD Errors:              0
>   FD termination status:  Error
>   SD termination status:  Error
>   Termination:            *** Backup Error ***
>
>
> Any help would be appreciated.


We've been having the exact same issue, with backups timing out after
_exactly_ 2 hours, 11 minutes and 15 seconds. This weekend I finally
found the most likely culprit, which was a (Checkpoint) firewall between
the director and the client. It started dropping ACK packets after  
exactly
2 hours, because the TCP session timed out, so it would only accepts
SYN packets. The director would then start retransmiting the packets at
at a 75 second interval fro 11 minutes and 25 seconds, after which a  
timeout
occurs. The strange thing is that TCP_TIMEOUT was set at 60 minutes on
the firewall and that Heartbeat Interval on the client was set at 5  
minutes,
so something's weird about this...

I've changed the TCP_TIMEOUT to 3 hours on the firewall and decreased
the Heartbeat Interval to 2 minutes on the clients. It's too soon to  
tell if this
has actually resolved the issue, because it would manifest itself  
intermittently
and unpredictably. However, I haven't seen a timeout in three days.

So you should check the network components (routers, firewalls) between
your director and clients.

Leander







-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to