--- Begin Message ---
Hi Muris
We sometimes see that the admin table or its indexes can get very large,
despite not much data being in there. This is due to index bloat in
Postgres, ie. fragmentation of on-disk pages.
Since all your error seem to come from admin, I'd try clearing out the
table and recreating the indexes like shown here:
<https://github.com/netdisco/netdisco/wiki/Database-Tips#polling-mysteriously-stuck-or-very-large-admindevice_skip-tables-and-indexes>
No worries, the table just holds job queue records and clearing it out
does not lose any data you care about.
If you're still having trouble, you can also look at index bloat in
other tables:
<https://github.com/netdisco/netdisco/wiki/Database-Tips#unreasonable-database-size-index-bloat>
and try one of the proposed solutions there, e.g.
netdisco-do psql -e 'reindex database concurrently netdisco'
For general Postgres memory tuning, I like using
<https://pgtune.leopard.in.ua/#/about> since it just asks a few system
questions and then outputs all the ~20 parameters in a reasonable
relation to each other.
Cheers
Christian
On 07.03.23 11:20, Muris wrote:
Hey guys,
I seem to be having some out of memory errors on upgrading to netdisco
2.60.7 the latest version.. not sure whats going on.
I followed this guide
https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/ <https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/>
And I increased the worker_mem to 32MB which seems to have helped a bit,
but still noticing errors on the backend time to time, and messages like
this, any ideas?
Running on Xeon 8 core with 16gb ram.
DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st
execute failed: ERROR:out of memory
DETAIL:Failed on request of size 32800 in memory context
"HashBatchContext". [for Statement "SELECT me.job, me.entered,
me.started, me.finished, me.device, me.port, me.action, me.subaction,
me.status, me.username, me.userip, me.log, me.debug, me.device_key,
me.job_priority FROM (WITH my_jobs AS
(SELECT admin.* FROM admin
LEFT OUTER JOIN device_skip ds
ON (ds.backend = ? AND admin.device = ds.device
AND admin.action = ANY (ds.actionset))
WHERE admin.status = 'queued'
AND ds.device IS NULL)
SELECT my_jobs.*,
CASE WHEN ( (my_jobs.username IS NOT NULL AND ((ds.deferrals IS NULL AND
ds.last_defer IS NULL)
OR my_jobs.entered > ds.last_defer))
OR (my_jobs.action = ANY (string_to_array(btrim(?, '{"}'), '","'))) )
THEN 100
ELSE 0
END AS job_priority
FROM my_jobs
LEFT OUTER JOIN device_skip ds
ON (ds.backend = ? AND ds.device = my_jobs.device)
WHERE ds.deferrals < ?
OR (my_jobs.username IS NOT NULL AND (ds.last_defer IS NULL
OR my_jobs.entered > ds.last_defer))
OR (ds.deferrals IS NULL AND ds.last_defer IS NULL)
OR ds.last_defer <= ( LOCALTIMESTAMP - ?::interval )
ORDER BY job_priority DESC,
ds.deferrals ASC NULLS FIRST,
ds.last_defer ASC NULLS LAST,
device_key DESC NULLS LAST,
random()
LIMIT ?
) me" with ParamValues: 1='netdisco',
2='{"contact","hook::exec","hook::http","location","portcontrol","portname","power","snapshot","vlan"}', 3=’netdisco’, 4='10', 5='7 days', 6='1'] at /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 122
[5405] 2023-03-07 08:36:21 error bless( {'msg' =>
'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st
execute failed: ERROR:out of memory
DETAIL:Failed on request of size 72 in memory context "MessageContext".
[for Statement "UPDATE admin SET finished = ?, log = ?, started = ?,
status = ? WHERE ( job = ? )" with ParamValues: 1=\'Tue Mar7 19:06:21
2023\', 2=\'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception:
DBD::Pg::st execute failed: ERROR:out of memory
DETAIL:Failed on request of size 16 in memory context "MessageContext".
[for Statement "SELECT me.ip, me.port, me.creation, me.descr, me.up,
me.up_admin, me.type, me.duplex, me.duplex_admin, me.speed,
me.speed_admin, me.name, me.mac, me.mtu, me.stp, me.remote_ip,
me.remote_port, me.remote_type, me.remote_id, me.has_subinterfaces,
me.is_master, me.slave_of, me.manual_topo, me.is_uplink, me.vlan,
me.pvid, me.lastchange, me.custom_fields, neighbor_alias.ip,
neighbor_alias.alias, neighbor_alias.subnet, neighbor_alias.port,
neighbor_alias.dns, neighbor_alias.creation, device.ip, device.creation,
device.dns, device.description, device.uptime, device.contact,
device.name, device.location, device.layers, device.num_ports,
device.mac, device.serial, device.chassis_id, device.model,
device.ps1_type, device.ps2_type, device.ps1_status, device.ps2_status,
device.fan, device.slots, dev...\', 3=\'Tue Mar7 19:06:17 2023\',
4=\'error\', 5=\'215666754\'] at
/home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 282
'}, 'DBIx::Class::Exception' )
[5427] 2023-03-07 08:36:28 error bless( {'msg' =>
'DBIx::Class::Storage::DBI::catch {...} (): DBI Connection failed: DBI
connect(\'dbname=netdisco\',\'netdisco\',...) failed: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request. at
/home/netdisco/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line 1639. at
/home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 292
'}, 'DBIx::Class::Exception' )
DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st
execute failed: ERROR:out of memory
Use of uninitialized value $op in pattern match (m//) at
/home/netdisco/perl5/lib/perl5/SQL/Abstract/Classic.pm line 327.
[14006] 2023-03-07 10:05:49 error bless( {'msg' =>
'DBIx::Class::SQLMaker::ClassicExtensions::puke(): Fatal: Operator calls
in update must be in the form { -op => $arg } at
/home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 282
'}, 'DBIx::Class::Exception' )
Use of uninitialized value $op in pattern match (m//) at
/home/netdisco/perl5/lib/perl5/SQL/Abstract/Classic.pm line 327.
[14718] 2023-03-07 10:10:16 error bless( {'msg' =>
'DBIx::Class::SQLMaker::ClassicExtensions::puke(): Fatal: Operator calls
in update must be in the form { -op => $arg } at
/home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 282
'}, 'DBIx::Class::Exception' )
--
Christian Ramseyer, netnea ag
Network Management. Security. OpenSource.
https://www.netnea.com
Phone: +41 79 644 77 64
--- End Message ---
--- Begin Message ---
Thanks Christian, I have done that also clearing out the admin table, also
i increased the work_mem to 64M which now the process workers are working
and not getting out of memory.
I left it running and it was working fine for an hour or two.. i just
checked it and all the nd2 processes seems to have stopped again, so i
checked
the netdisco-backend.log and it shows this
[27082] 2023-03-07 22:31:36 error bless( {'msg' =>
'DBIx::Class::Row::update(): Can\'t update
App::Netdisco::DB::Result::DeviceSkip=HASH(0x5da4278): updated more than
one row at
/home/netdisco/perl5/lib/perl5/App/Netdisco/DB/Result/DeviceSkip.pm line 41
'}, 'DBIx::Class::Exception' )
It seems something in the device_skip table causes it to crash, then all
the process workers stop working.. then I have to manually restart
netdisco-backend,
I have cleared device_skip and still this comes... any ideas about this one?
On Wed, Mar 8, 2023 at 8:33 AM Christian Ramseyer <[email protected]>
wrote:
> Hi Muris
>
> We sometimes see that the admin table or its indexes can get very large,
> despite not much data being in there. This is due to index bloat in
> Postgres, ie. fragmentation of on-disk pages.
>
> Since all your error seem to come from admin, I'd try clearing out the
> table and recreating the indexes like shown here:
>
> <
> https://github.com/netdisco/netdisco/wiki/Database-Tips#polling-mysteriously-stuck-or-very-large-admindevice_skip-tables-and-indexes
> >
>
> No worries, the table just holds job queue records and clearing it out
> does not lose any data you care about.
>
> If you're still having trouble, you can also look at index bloat in
> other tables:
> <
> https://github.com/netdisco/netdisco/wiki/Database-Tips#unreasonable-database-size-index-bloat>
>
> and try one of the proposed solutions there, e.g.
>
> netdisco-do psql -e 'reindex database concurrently netdisco'
>
> For general Postgres memory tuning, I like using
> <https://pgtune.leopard.in.ua/#/about> since it just asks a few system
> questions and then outputs all the ~20 parameters in a reasonable
> relation to each other.
>
>
>
> Cheers
> Christian
>
>
> On 07.03.23 11:20, Muris wrote:
> > Hey guys,
> >
> > I seem to be having some out of memory errors on upgrading to netdisco
> > 2.60.7 the latest version.. not sure whats going on.
> >
> > I followed this guide
> >
> https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/
> <
> https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases/
> >
> >
> > And I increased the worker_mem to 32MB which seems to have helped a bit,
> > but still noticing errors on the backend time to time, and messages like
> > this, any ideas?
> >
> > Running on Xeon 8 core with 16gb ram.
> >
> > DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st
> > execute failed: ERROR:out of memory
> >
> > DETAIL:Failed on request of size 32800 in memory context
> > "HashBatchContext". [for Statement "SELECT me.job, me.entered,
> > me.started, me.finished, me.device, me.port, me.action, me.subaction,
> > me.status, me.username, me.userip, me.log, me.debug, me.device_key,
> > me.job_priority FROM (WITH my_jobs AS
> >
> > (SELECT admin.* FROM admin
> >
> > LEFT OUTER JOIN device_skip ds
> >
> > ON (ds.backend = ? AND admin.device = ds.device
> >
> > AND admin.action = ANY (ds.actionset))
> >
> > WHERE admin.status = 'queued'
> >
> > AND ds.device IS NULL)
> >
> > SELECT my_jobs.*,
> >
> > CASE WHEN ( (my_jobs.username IS NOT NULL AND ((ds.deferrals IS NULL AND
> > ds.last_defer IS NULL)
> >
> > OR my_jobs.entered > ds.last_defer))
> >
> > OR (my_jobs.action = ANY (string_to_array(btrim(?, '{"}'), '","'))) )
> >
> > THEN 100
> >
> > ELSE 0
> >
> > END AS job_priority
> >
> > FROM my_jobs
> >
> > LEFT OUTER JOIN device_skip ds
> >
> > ON (ds.backend = ? AND ds.device = my_jobs.device)
> >
> > WHERE ds.deferrals < ?
> >
> > OR (my_jobs.username IS NOT NULL AND (ds.last_defer IS NULL
> >
> > OR my_jobs.entered > ds.last_defer))
> >
> > OR (ds.deferrals IS NULL AND ds.last_defer IS NULL)
> >
> > OR ds.last_defer <= ( LOCALTIMESTAMP - ?::interval )
> >
> > ORDER BY job_priority DESC,
> >
> > ds.deferrals ASC NULLS FIRST,
> >
> > ds.last_defer ASC NULLS LAST,
> >
> > device_key DESC NULLS LAST,
> >
> > random()
> >
> > LIMIT ?
> >
> > ) me" with ParamValues: 1='netdisco',
> >
> 2='{"contact","hook::exec","hook::http","location","portcontrol","portname","power","snapshot","vlan"}',
> 3=’netdisco’, 4='10', 5='7 days', 6='1'] at
> /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line 122
> >
> > [5405] 2023-03-07 08:36:21 error bless( {'msg' =>
> > 'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st
> > execute failed: ERROR:out of memory
> >
> > DETAIL:Failed on request of size 72 in memory context "MessageContext".
> > [for Statement "UPDATE admin SET finished = ?, log = ?, started = ?,
> > status = ? WHERE ( job = ? )" with ParamValues: 1=\'Tue Mar7 19:06:21
> > 2023\', 2=\'DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception:
> > DBD::Pg::st execute failed: ERROR:out of memory
> >
> > DETAIL:Failed on request of size 16 in memory context "MessageContext".
> > [for Statement "SELECT me.ip, me.port, me.creation, me.descr, me.up,
> > me.up_admin, me.type, me.duplex, me.duplex_admin, me.speed,
> > me.speed_admin, me.name, me.mac, me.mtu, me.stp, me.remote_ip,
> > me.remote_port, me.remote_type, me.remote_id, me.has_subinterfaces,
> > me.is_master, me.slave_of, me.manual_topo, me.is_uplink, me.vlan,
> > me.pvid, me.lastchange, me.custom_fields, neighbor_alias.ip,
> > neighbor_alias.alias, neighbor_alias.subnet, neighbor_alias.port,
> > neighbor_alias.dns, neighbor_alias.creation, device.ip, device.creation,
> > device.dns, device.description, device.uptime, device.contact,
> > device.name, device.location, device.layers, device.num_ports,
> > device.mac, device.serial, device.chassis_id, device.model,
> > device.ps1_type, device.ps2_type, device.ps1_status, device.ps2_status,
> > device.fan, device.slots, dev...\', 3=\'Tue Mar7 19:06:17 2023\',
> > 4=\'error\', 5=\'215666754\'] at
> > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line
> 282
> >
> > '}, 'DBIx::Class::Exception' )
> >
> > [5427] 2023-03-07 08:36:28 error bless( {'msg' =>
> > 'DBIx::Class::Storage::DBI::catch {...} (): DBI Connection failed: DBI
> > connect(\'dbname=netdisco\',\'netdisco\',...) failed: server closed the
> > connection unexpectedly
> >
> > This probably means the server terminated abnormally
> >
> > before or while processing the request. at
> > /home/netdisco/perl5/lib/perl5/DBIx/Class/Storage/DBI.pm line 1639. at
> > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line
> 292
> >
> > '}, 'DBIx::Class::Exception' )
> >
> > DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st
> > execute failed: ERROR:out of memory
> >
> > Use of uninitialized value $op in pattern match (m//) at
> > /home/netdisco/perl5/lib/perl5/SQL/Abstract/Classic.pm line 327.
> >
> > [14006] 2023-03-07 10:05:49 error bless( {'msg' =>
> > 'DBIx::Class::SQLMaker::ClassicExtensions::puke(): Fatal: Operator calls
> > in update must be in the form { -op => $arg } at
> > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line
> 282
> >
> > '}, 'DBIx::Class::Exception' )
> >
> > Use of uninitialized value $op in pattern match (m//) at
> > /home/netdisco/perl5/lib/perl5/SQL/Abstract/Classic.pm line 327.
> >
> > [14718] 2023-03-07 10:10:16 error bless( {'msg' =>
> > 'DBIx::Class::SQLMaker::ClassicExtensions::puke(): Fatal: Operator calls
> > in update must be in the form { -op => $arg } at
> > /home/netdisco/perl5/lib/perl5/App/Netdisco/JobQueue/PostgreSQL.pm line
> 282
> >
> > '}, 'DBIx::Class::Exception' )
>
>
> --
> Christian Ramseyer, netnea ag
> Network Management. Security. OpenSource.
> https://www.netnea.com
> Phone: +41 79 644 77 64
>
--- End Message ---