Hi all,

I'm using the bareos-fd-postgresql plugin to backup the director's database.

The config is:

--%snip%--
Job {
  Name = backup-mydirector-postgres
  Client = mydirector
  JobDefs = postgres
  Storage = File-mystorage
  Maximum Concurrent Jobs = 1
}

JobDefs {
  Name = postgres
  JobDefs = DefaultJob
  FileSet = postgres
}

FileSet {
  Name = postgres
  Description = "Fileset for postgres"
  Include {
    Options {
      Signature = XXH128
      Compression = LZ4HC
    }
    Plugin = "python3"
      ":module_name=bareos-fd-postgresql"
      ":db_host=/run/postgresql"
      ":wal_archive_dir=/var/lib/pgsql/wal_archive"
      ":switch_wal_timeout=180"
  }
}
--%snip%--

The dbms is configured as follows:

--%snip%--
max_wal_size = 1GB
min_wal_size = 80MB
archive_mode = on
archive_command = 'install -D %p /var/lib/pgsql/wal_archive/%f'
restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /var/lib/pgsql/wal_archive %r'
--%snip%--

There is no replication slave.


From time to time I get the following error:

18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Got last_backup_stop_time 
1721215228 from restore object of job 44528
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Got last_lsn 17/85000000 
from restore object of job 44528
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Got pg major version 13 
from restore object of job 44528
18-Jul 11:20 mydirector JobId 44591: Using Device "File-mystorage" to write.
18-Jul 11:20 mydirector JobId 44591: Extended attribute support is enabled
18-Jul 11:20 mydirector JobId 44591: ACL support is enabled
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: python: 3.9.18 (main, May 16 2024, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3.0.1)] | pg8000: 1.31.2
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Connected to PostgreSQL 
version 130014
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Current LSN 17/87538B18, 
last LSN: 17/85000000
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: A difference was found, 
between current_lsn 17/87538B18 and last LSN: 17/85000000
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Current LSN 17/880001A8, 
last LSN: 17/85000000
18-Jul 11:23 mydirector JobId 44591: Fatal error: python3-fd-mod: Timeout 
waiting 180 sec. for wal file 000000010000001700000088 to be archived
18-Jul 11:23 mydirector JobId 44591: Fatal error: filed/fd_plugins.cc:673 PluginSave: 
Command plugin 
"python3:module_name=bareos-fd-postgresql:db_host=/run/postgresql:wal_archive_dir=/var/lib/pgsql/wal_archive:switch_wal_timeout=180"
 requested, but job is already cancelled.
18-Jul 11:23 mydirector JobId 44591: python3-fd-mod: Database connection closed.
18-Jul 11:20 mystorage JobId 44591: Connected File Daemon at 192.168.1.5:9102, 
encryption: TLS_AES_256_GCM_SHA384 TLSv1.3
18-Jul 11:23 mydirector JobId 44591: Fatal error: Director's comm line to SD 
dropped

As you can see, I already increased the default value of 60s for switch_wal_timeout to 180s, but this error still shows up.

The database is stored on an nvme, with no performance bottlenecks (ram, cpu).

Does anyone have an idea of how to get this fixed?

Thanks & kind regards,

Philippe

--
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bareos-users/ed810a42-6e90-4c5e-bfe9-911df4493a59%40quarantine.de.

Reply via email to