Hi all,
I'm using the bareos-fd-postgresql plugin to backup the director's database.
The config is:
--%snip%--
Job {
Name = backup-mydirector-postgres
Client = mydirector
JobDefs = postgres
Storage = File-mystorage
Maximum Concurrent Jobs = 1
}
JobDefs {
Name = postgres
JobDefs = DefaultJob
FileSet = postgres
}
FileSet {
Name = postgres
Description = "Fileset for postgres"
Include {
Options {
Signature = XXH128
Compression = LZ4HC
}
Plugin = "python3"
":module_name=bareos-fd-postgresql"
":db_host=/run/postgresql"
":wal_archive_dir=/var/lib/pgsql/wal_archive"
":switch_wal_timeout=180"
}
}
--%snip%--
The dbms is configured as follows:
--%snip%--
max_wal_size = 1GB
min_wal_size = 80MB
archive_mode = on
archive_command = 'install -D %p /var/lib/pgsql/wal_archive/%f'
restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /var/lib/pgsql/wal_archive %r'
--%snip%--
There is no replication slave.
From time to time I get the following error:
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Got last_backup_stop_time
1721215228 from restore object of job 44528
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Got last_lsn 17/85000000
from restore object of job 44528
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Got pg major version 13
from restore object of job 44528
18-Jul 11:20 mydirector JobId 44591: Using Device "File-mystorage" to write.
18-Jul 11:20 mydirector JobId 44591: Extended attribute support is enabled
18-Jul 11:20 mydirector JobId 44591: ACL support is enabled
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: python: 3.9.18 (main, May 16 2024, 00:00:00)
[GCC 11.4.1 20231218 (Red Hat 11.4.1-3.0.1)] | pg8000: 1.31.2
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Connected to PostgreSQL
version 130014
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Current LSN 17/87538B18,
last LSN: 17/85000000
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: A difference was found,
between current_lsn 17/87538B18 and last LSN: 17/85000000
18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Current LSN 17/880001A8,
last LSN: 17/85000000
18-Jul 11:23 mydirector JobId 44591: Fatal error: python3-fd-mod: Timeout
waiting 180 sec. for wal file 000000010000001700000088 to be archived
18-Jul 11:23 mydirector JobId 44591: Fatal error: filed/fd_plugins.cc:673 PluginSave:
Command plugin
"python3:module_name=bareos-fd-postgresql:db_host=/run/postgresql:wal_archive_dir=/var/lib/pgsql/wal_archive:switch_wal_timeout=180"
requested, but job is already cancelled.
18-Jul 11:23 mydirector JobId 44591: python3-fd-mod: Database connection closed.
18-Jul 11:20 mystorage JobId 44591: Connected File Daemon at 192.168.1.5:9102,
encryption: TLS_AES_256_GCM_SHA384 TLSv1.3
18-Jul 11:23 mydirector JobId 44591: Fatal error: Director's comm line to SD
dropped
As you can see, I already increased the default value of 60s for
switch_wal_timeout to 180s, but this error still shows up.
The database is stored on an nvme, with no performance bottlenecks (ram,
cpu).
Does anyone have an idea of how to get this fixed?
Thanks & kind regards,
Philippe
--
You received this message because you are subscribed to the Google Groups
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/bareos-users/ed810a42-6e90-4c5e-bfe9-911df4493a59%40quarantine.de.