Segmentation fault restoring data to Azure VM

Jeffrey Durrence Tue, 23 Apr 2024 08:41:59 -0700

Greetings,

I am migrating PostgreSQL/PostGIS databases from a physical server with these 
specs:
Host OS: Ubuntu 22.04.4 LTS x64 (32GB RAM, 8GB Swap, 1TB disk space, XEON 
E5-1660 CPU)
Postgres: 15.6.1-pgdg22.04+1
PostGIS: 3.4.2


To an Azure VM with
Host OS: Ubuntu 22.04.4 LTS x64 (32GB RAM, 8GB Swap, 512GB disk space, XEON 
8171M) (again, this is in Azure VM)
Postgres: 15.6.1-pgdg22.04+1
PostGIS: 3.4.2


In the past, I have migrated the same data from on the same physical hardware 
for changes in host O/S, Postgres version, and/or PostGIS version. I've also 
migrated from one physical host to another.  When doing so, I try to make sure 
that my software packages are up to date and that the packages in the 
destination environment match.  I did this for this planned environment as 
well.  Because I know that some of my dbs have several years of history in 
them, I export these dbs individually using pg_dump (using the custom format 
when possible but in some cases I use the plain text option) and pg_restore on 
the destination host.

During the process of migrating data in this manner, I observed that 
restoration of many of my dbs failed.  I did lots of testing with this and have 
made these additional observations:

  1.
I am able to restore my export files to another physical host with the same 
software environment without error.  (the errors occur when importing to 
virtual environment).
  2.
If I split my export/import into separate steps for the schema and data, it is 
the schema import that produces the error, not the data import.
  3.
When the error occurs, the db process terminates with a segmentation vault (see 
log output below), and the db process restarts.
  4.
With logging of all statements enabled, I can see that the segmentation vault 
occurs at the absolute end of the import, when it seems that all commands are 
completed.  So long as I wait on the server process to recover, I have no 
errors when I import the data following the error.
  5.
The db imports that produce these errors seem to be those with more history 
(e.g., it's more likely that these were created with an older version of 
postGIS and subsequently upgraded to the versions they have now).
  6.
The errors do seem related to the installation of the PostGIS extension.  If I 
purposely drop the extension from the source database before I export, the 
resulting file can be imported in the target environment without error.  This 
is obviously undesirable as it results in my losing all of the geometry info 
from tables, views, etc. in the db.  I only did this to experiment.

My typical restore command looks like this:
pg_restore --verbose --format=custom --dbname=my_db 
/tmp/my_db_schema_only.backup
(plus some options to capture the output)

Here's a look at what the command line output looks like when this happens:

LOG:  server process (PID 52476) was terminated by signal 11: Segmentation fault
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2024-04-23 02:16:49 UTC
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 5/51381960
WARNING:  could not open directory "base/696372": No such file or directory
CONTEXT:  WAL redo at 5/51381A48 for Database/DROP: dir 1663/696372
WARNING:  some useless files may be left behind in old database directory 
"base/696372"
CONTEXT:  WAL redo at 5/51381A48 for Database/DROP: dir 1663/696372
LOG:  invalid record length at 5/52679558: expected at least 24, got 0
LOG:  redo done at 5/526794E0 system usage: CPU: user: 0.07 s, system: 0.01 s, 
elapsed: 0.10 s
LOG:  checkpoint starting: end-of-recovery immediate wait
LOG:  checkpoint complete: wrote 2699 buffers (16.5%); 0 WAL file(s) added, 1 
removed, 0 recycled;
B; lsn=5/52679558, redo lsn=5/52679558
LOG:  database system is ready to accept connections

In the log file, the section written as the error occurs is something like 
this...

[60419] [6627bcbb.ec03] [2024-04-23 13:50:54.813 UTC] [0]: LOG:  statement: 
GRANT ALL ON TABLE my_schema.my_table TO my_user_name;
[60419] [6627bcbb.ec03] [2024-04-23 13:50:54.815 UTC] [0]: LOG:  statement: -- 
Completed on 2024-04-23 13:50:54 UTC
[60419] [6627bcbb.ec03] [2024-04-23 13:50:54.815 UTC] [0]: LOG:  statement: --
        -- PostgreSQL database dump complete
        --
[59940] [6627b39f.ea24] [2024-04-23 13:50:54.943 UTC] [0]: LOG:  server process 
(PID 60419) was terminated by signal 11: Segmentation fault
[59940] [6627b39f.ea24] [2024-04-23 13:50:54.943 UTC] [0]: LOG:  terminating 
any other active server processes
[59940] [6627b39f.ea24] [2024-04-23 13:50:54.945 UTC] [0]: LOG:  all server 
processes terminated; reinitializing
[60423] [6627bcbf.ec07] [2024-04-23 13:50:55.007 UTC] [0]: LOG:  database 
system was interrupted; last known up at 2024-04-23 13:50:48 UTC
[60423] [6627bcbf.ec07] [2024-04-23 13:51:05.511 UTC] [0]: LOG:  syncing data 
directory (fsync), elapsed time: 10.00 s, current path: ./base/897556/1249
[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.554 UTC] [0]: LOG:  database 
system was not properly shut down; automatic recovery in progress
[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.606 UTC] [0]: LOG:  redo starts at 
E/3A8785A8
[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.607 UTC] [0]: WARNING:  could not 
open directory "base/562219": No such file or directory
[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.607 UTC] [0]: CONTEXT:  WAL redo 
at E/3A878690 for Database/DROP: dir 1663/562219
[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.607 UTC] [0]: WARNING:  some 
useless files may be left behind in old database directory "base/562219"
[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.607 UTC] [0]: CONTEXT:  WAL redo 
at E/3A878690 for Database/DROP: dir 1663/562219
[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.704 UTC] [0]: LOG:  invalid record 
length at E/3BACDAF8: expected at least 24, got 0
[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.704 UTC] [0]: LOG:  redo done at 
E/3BACDA80 system usage: CPU: user: 0.07 s, system: 0.01 s, elapsed: 0.09 s
[60424] [6627bcbf.ec08] [2024-04-23 13:51:12.804 UTC] [0]: LOG:  checkpoint 
starting: end-of-recovery immediate wait
[60424] [6627bcbf.ec08] [2024-04-23 13:51:13.143 UTC] [0]: LOG:  checkpoint 
complete: wrote 2592 buffers (15.8%); 0 WAL file(s) added, 1 removed, 0 
recycled; writ
9 s, average=0.001 s; distance=18773 kB, estimate=18773 kB; lsn=E/3BACDAF8, 
redo lsn=E/3BACDAF8
[59940] [6627b39f.ea24] [2024-04-23 13:51:13.156 UTC] [0]: LOG:  database 
system is ready to accept connections


I searched the mail archive and performed some additional web searches based on 
my observations.  Unfortunately, I haven't found a similar-enough incident 
where I might get suggestions for further troubleshooting.  In over 10 years of 
using PostgreSQL+PostGIS, I've never had to dig this deeply to determine the 
source of error.  I'm looking for advice for further troubleshooting or for 
shared experiences in similar situations

Thank you for your time!


Jeffrey Durrence

McLean Engineering Company

www.mcleanengineering.com<http://www.mcleanengineering.com/>

Office: 229-985-1148

Mobile: 229-798-0480

Segmentation fault restoring data to Azure VM

Reply via email to