Hi guys,

For my company I've been trying to get bacula up and running properly.
My currect situation:

Host 'leiden' :

Located at my home, multiple large (8TB) raid arrays attached.
Therefore running bacula-sd and bacula-dir.
 >100mbit download bandwidth.
Running debian testing, bacula version 5.0.3.


Multiple hosts to be backed up, on a 100/100 connection.
debian stable, bacula 5.0.3
running bacula-fd, default config.


The complete bacula-dir.conf is located at: http://pastebin.com/8JvCdmL9
Please note that I have substituted all passwords by an X.

Relevant parts are:


Director {                            # define myself
   Name = leiden-dir
   QueryFile = "/etc/bacula/scripts/query.sql"
   WorkingDirectory = "/var/lib/bacula"
   PidDirectory = "/var/run/bacula"
   Maximum Concurrent Jobs = 10
   Password = "X"         # Console password
   Messages = Daemon
   DirAddresses = {
     ip = { addr = 192.168.1.44; port = 9101 }
     ip = { addr = 127.0.0.1; port =9101 }
   }
}

JobDefs {
   Name = "sql-weekly"
   Type = Backup
   Level = Incremental
   Client = sql
   FileSet = "Full Set"
   Schedule = "WeeklyCycle"
   Storage = leiden-filestorage
   Messages = Standard
   Pool = LeidenPool
   Priority = 10
}


JobDefs {
   Name = "mail-weekly"
   Type = Backup
   Level = Incremental
   Client = mail
   FileSet = "Full Set"
   Schedule = "WeeklyCycle"
   Storage = leiden-filestorage
   Messages = Standard
   Pool = LeidenPool
   Priority = 10
}


Job {
   Name = "sqljob"
   JobDefs = "sql-weekly"
   Write Bootstrap = "/var/lib/bacula/sql.bsr"
}
Job {
   Name = "mailjob"
   JobDefs = "mail-weekly"
   Write Bootstrap = "/var/lib/bacula/mail.bsr"
}
# Client (File Services) to backup
Client {
   Name = sql
   Address = sql.boudewijnector.nl
   FDPort = 9102
   Catalog = MyCatalog
   Password = "X"          # password for FileDaemon
   File Retention = 30 days            # 30 days
   Job Retention = 6 months            # six months
   AutoPrune = yes                     # Prune expired Jobs/Files
}

Client {
   Name = mail
   Address = mail.boudewijnector.nl
   FDPort = 9102
   Catalog = MyCatalog
   Password = "X"          # password for FileDaemon
   File Retention = 30 days            # 30 days
   Job Retention = 6 months            # six months
   AutoPrune = yes                     # Prune expired Jobs/Files
}



The current problem is that I get errors on some hosts, such as:


17-Jul 02:52 leiden-dir JobId 94: Fatal error: Network error with FD 
during Backup: ERR=Connection reset by peer
17-Jul 02:52 leiden-dir JobId 94: Fatal error: No Job status returned 
from FD.
17-Jul 02:52 leiden-dir JobId 94: Error: Bacula leiden-dir 5.0.3 
(04Aug10): 17-Jul-2011 02:52:30
   Build OS:               i486-pc-linux-gnu debian wheezy/sid
   JobId:                  94
   Job:                    BLAjob.2011-07-17_00.52.14_10
   Backup Level:           Full (upgraded from Incremental)
   Client:                 "client4" 5.0.2 (28Apr10) 
x86_64-pc-linux-gnu,debian,squeeze/sid
   FileSet:                "Home Set" 2011-07-16 23:49:43
   Pool:                   "LeidenPool" (From Job resource)
   Catalog:                "MyCatalog" (From Client resource)
   Storage:                "leiden-filestorage" (From Job resource)
   Scheduled time:         17-Jul-2011 00:52:13
   Start time:             17-Jul-2011 00:52:16
   End time:               17-Jul-2011 02:52:30
   Elapsed time:           2 hours 14 secs
   Priority:               10
   FD Files Written:       0
   SD Files Written:       137,033
   FD Bytes Written:       0 (0 B)
   SD Bytes Written:       3,586,674,915 (3.586 GB)
   Rate:                   0.0 KB/s
   Software Compression:   None
   VSS:                    no
   Encryption:             no
   Accurate:               no
   Volume name(s):         LeidenVol0005
   Volume Session Id:      20
   Volume Session Time:    1310599400
   Last Volume Bytes:      12,025,925,394 (12.02 GB)
   Non-fatal FD errors:    0
   SD Errors:              0
   FD termination status:  Error
   SD termination status:  OK
   Termination:            *** Backup Error ***


When trying to rerun the job it also fails after 2 hours....  I tried to 
fix it this way:


In the Job @ bacula-dir , I added "Max Run Time = 144000" because it 
seemed like bacula shut down the connection after 2 hours.
I also changed the keep-alive time on the machine running bacula-dir :

    sysctl -w net.ipv4.tcp_keepalive_time=60

When I did so, it failed completely:

   Elapsed time:           15 hours 22 mins 58 secs
   Priority:               10
   FD Files Written:       0
   SD Files Written:       0
   FD Bytes Written:       0 (0 B)
   SD Bytes Written:       0 (0 B)
   Rate:                   0.0 KB/s
   Software Compression:   None
   VSS:                    no
   Encryption:             no
   Accurate:               no
   Volume name(s):
   Volume Session Id:      33
   Volume Session Time:    1310599400

That's really bad, my router did not detect any traffic at all except 
for some bytes when setting up the connection.



Can someone please point me out where I should start to investigate this 
problem?

 From the internet, I can reach the director and the SD @ the 'leiden' 
system.
I can reach the FD's at all servers which are to be backed up.

Cheers,

Boudewijn Ector



------------------------------------------------------------------------------
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to