For anyone reading this with the same issue, we have added Max Concurrent Copies = 1000 to the definition of the copy job, this has fixed the issues for us
Met vriendelijke groet / With kind regards, Rick Tuk Senior DevOps Engineer On Feb 25, 2021, at 9:06 AM, 'Rick Tuk' via bareos-users <[email protected]<mailto:[email protected]>> wrote: I have ran the copy job manually a fair few times now, it did provide me with a clearer view of what exactly happens: CopyLocalToRemote is scheduled (either manually or automatically) This job runs for a few seconds selecting all JobIds for not previously copied jobs It then starts adding 2 jobs to run per found JobId, 1 “control job” with the name CopyLocalToRemote and a new JobId and 1 job with the name of original job that is being copied. It does this for exactly 100 original jobs, resulting in 200 jobs to be run. It then start running jobs, both the “CopyLocalToRemote” and the job with the original job name are started, so always 2 jobs run at the same time All other jobs are waiting. Here is an overview of what “status director” says under running jobs: Running Jobs: Console connected at 25-Feb-21 08:25 JobId Level Name Status ====================================================================== 41809 Increme CopyLocalToRemote.2021-02-25_08.26.49_13 is running 41810 Full typhon-default.2021-02-25_08.26.49_14 is running 41811 Increme CopyLocalToRemote.2021-02-25_08.26.49_15 is waiting on max Storage jobs 41812 Full thoth-stacks.2021-02-25_08.26.49_16 is waiting execution 41813 Increme CopyLocalToRemote.2021-02-25_08.26.49_17 is waiting on max Storage jobs 41814 Full worker005-default.2021-02-25_08.26.49_18 is waiting execution 41815 Increme CopyLocalToRemote.2021-02-25_08.26.49_19 is waiting on max Storage jobs 41816 Full metis-default.2021-02-25_08.26.49_20 is waiting execution 41817 Increme CopyLocalToRemote.2021-02-25_08.26.49_21 is waiting on max Storage jobs 41818 Full soter-default.2021-02-25_08.26.49_22 is waiting execution In the logs for the initial CopyLocalToRemote job that is started it shows how many and which JobIds are being selected, this list does contain all uncopied jobs: soteria-dir JobId 41866: The following 2253 JobIds were chosen to be copied: 27655,27656,27657,27659,27660,27661,27662,27672,27677,27679,27681,27682,27683,27686,27688,27691,27692,27693,27694,27695, etc It than has 3 entries for every job that is started: soteria-dir JobId 41866: Using Catalog “Catalog" soteria-dir JobId 41866: Automatically selected Catalog: Catalog soteria-dir JobId 41866: Copying JobId 42067 started. This run started at JobId 42067 and ended with 42265 Every JobId that is started from the above 3 log line increment each JobId by 2 And again, the list of JobIds that “were chosen to be copied” is a lot longer than the list of jobs that are actually started With over 120 clients to be backed up every day (daily incremental, weekly differential and monthly full runs) the copy job will never catch up unless I keep running copy jobs manually. Met vriendelijke groet / With kind regards, Rick Tuk Senior DevOps Engineer On Feb 24, 2021, at 2:58 PM, Brock Palen <[email protected]<mailto:[email protected]>> wrote: Others can confirm, but I have never had bareos have more than 200 jobs in the queue at a time. Let the jobs finish, but manually run the copy job again, it should grab more and more until it’s caught up and you can run it nightly/weekly etc. Brock Palen [email protected]<mailto:[email protected]> www.mlds-networks.com<http://www.mlds-networks.com/> Websites, Linux, Hosting, Joomla, Consulting On Feb 24, 2021, at 3:56 AM, 'Rick Tuk' via bareos-users <[email protected]<mailto:[email protected]>> wrote: LS, I recently added a second storage daemon and a copy job to my configuration. I use Selection Type = PoolUncopiedJobs to select the jobs to copy. I assumed that on the first run it would select all jobs in the read pool and copy those to the next pool It seems like the copy job does select all jobs that still need to be copied: JobId 32583: The following 2235 JobIds were chosen to be copied However only about 200 jobs are actually added to the queue and completed. Either there is a limit of 200 I cannot seem to find in either the documentation or my config files, or only a single job for a client is actually copied in each run. Any help would be very much appreciated. Director config: Director { Name = soteria Dir Address = soteria Dir Port = 9101 Password = “<password>" Query File = "/usr/lib/bareos/scripts/query.sql" Maximum Concurrent Jobs = 1 Messages = Daemon Auditing = yes # Enable the Heartbeat if you experience connection losses # (eg. because of your router or firewall configuration). # Additionally the Heartbeat can be enabled in bareos-sd and bareos-fd. # # Heartbeat Interval = 1 min Backend Directory = /usr/lib/bareos/backends # remove comment from "Plugin Directory" to load plugins from specified directory. # if "Plugin Names" is defined, only the specified plugins will be loaded, # otherwise all director plugins (*-dir.so) from the "Plugin Directory". # # Plugin Directory = "/usr/lib/bareos/plugins" # Plugin Names = "" } Pool config: Pool { Name = Local-Full Pool Type = Backup Recycle = yes AutoPrune = yes Storage = Local-Full Next Pool = Remote-Full File Retention = 12 months Job Retention = 12 months Volume Retention = 12 months Maximum Volume Bytes = 25G Label Format = full- } Pool { Name = Remote-Full Pool Type = Backup Recycle = yes AutoPrune = yes Storage = Remote-Full File Retention = 12 months Job Retention = 12 months Volume Retention = 12 months Maximum Volume Bytes = 25G Label Format = full-remote- } Storage config: Storage { Name = Local-Full Address = salus SD Port = 9103 Password = “<password>" Device = Local-Full Media Type = File } Storage { Name = Remote-Full Address = sancus SD Port = 9103 Password = “<password>" Device = Remote-Full Media Type = File } Schedule config: Schedule { Name = "Default" Run = Level=Full Pool=Local-Full 1st sat at 23:00 Run = Level=Differential Pool=Local-Diff FullPool=Local-Full 2nd-5th sat at 23:00 Run = Level=Incremental Pool=Local-Inc FullPool=Local-Full DifferentialPool=Local-Diff sun-fri at 23:00 } Copy job: Job { Name = “CopyLocalToRemote" Type = Copy Level = Incremental Storage = Local-Inc Pool = Local-Inc Full Backup Pool = Local-Full Differential Backup Pool = Local-Diff Incremental Backup Pool = Local-Inc Selection Type = PoolUncopiedJobs Schedule = "Default" Messages = "Standard" Priority = 14 } SD config on sales: Storage { Name = salus SD Address = salus SD Port = 9103 Maximum Concurrent Jobs = 20 Backend Directory = /usr/lib/bareos/backends # remove comment from "Plugin Directory" to load plugins from specified directory. # if "Plugin Names" is defined, only the specified plugins will be loaded, # otherwise all storage plugins (*-sd.so) from the "Plugin Directory". # Plugin Directory = "/usr/lib/bareos/plugins" # Plugin Names = "" } SD config on sancus: Storage { Name = sancus SD Address = sancus SD Port = 9103 Maximum Concurrent Jobs = 20 Backend Directory = /usr/lib/bareos/backends # remove comment from "Plugin Directory" to load plugins from specified directory. # if "Plugin Names" is defined, only the specified plugins will be loaded, # otherwise all storage plugins (*-sd.so) from the "Plugin Directory". # Plugin Directory = "/usr/lib/bareos/plugins" # Plugin Names = "" } Device config on sales: Device { Name = Local-Full Archive Device = /bareos/backup/full Device Type = File Media Type = File Label Media = yes Random Access = yes Automatic Mount = yes Removable Media = no Always Open = no Maximum Concurrent Jobs = 1 } Device config on sancus: Device { Name = Remote-Full Archive Device = /bareos/backup/full Device Type = File Media Type = File Label Media = yes Random Access = yes Automatic Mount = yes Removable Media = no Always Open = no Maximum Concurrent Jobs = 1 } Met vriendelijke groet / With kind regards, Rick Tuk Senior DevOps Engineer -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]<mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/d/msgid/bareos-users/BFD997D1-46E3-42FD-BD73-894350EEC9AE%40mostwanted.io. -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]<mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/d/msgid/bareos-users/E8885C82-0437-4D47-8E7A-21453CEE577B%40mostwanted.io<https://groups.google.com/d/msgid/bareos-users/E8885C82-0437-4D47-8E7A-21453CEE577B%40mostwanted.io?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/bareos-users/9AAF1115-5BC5-4A9B-A8AF-E2DB9B192509%40mostwanted.io.
