+1 from me
The bounceevent table also hit me during the migration to mailman3, but
there were instruction how to skip that step and mark all that were
already in the db processed. I assume that processed bounce events
should be deleted from DB after some time as a permanent solution, like
once per year or so.
On 4/8/25 23:41, Kevin Fenzi via infrastructure wrote:
Greetings.
We have had several applications crashing (resultsdb,
resultsdb_ci_listener) or being slow (bodhi) of late.
I did some digging today and discovered that db01 is pretty saturated on
I/O. This means all the apps that use db01 are fighting i/o and
returning things slower than they should.
On looking more, it was mailman that was using the vast amount of the
i/o. I of course thought at first that it was crawlers, but it is not.
Instead it seems to be the bounce processor.
This processor wakes up every few minutes and does a query for any
bounces in the bounceevent table that are processed = 'false'.
If it finds any, it processes them.
However, that table is now 50GB and contains 152167015 rows
(all of them pretty much processed = 'True').
From the logs (which logs slow queries), an example:
2025-04-08 21:32:40.510 GMT [7073] LOG: duration: 267423.928 ms plan:
Query Text: SELECT bounceevent.id AS bounceevent_id,
bounceevent.list_id AS bounceevent_
list_id, bounceevent.email AS bounceevent_email, bounceevent.timestamp AS
bounceevent_timestamp,
bounceevent.message_id AS bounceevent_message_id, bounceevent.context AS
bounceevent_context, b
ounceevent.processed AS bounceevent_processed
FROM bounceevent
WHERE bounceevent.processed = false
Gather (cost=1000.00..7441540.83 rows=1 width=137)
Workers Planned: 2
-> Parallel Seq Scan on bounceevent (cost=0.00..7440540.73 rows=1
width=137)
Filter: (NOT processed)
Yes, thats 267seconds to process that query, all the time hammering I/O
because the table is too large to cache well.
This all pointed me to find this 7 year old bug report:
https://gitlab.com/mailman/mailman/-/issues/343
Hopefully abompard finds it a fun blast from the past. :)
Anyhow, a quick fix I think would be:
* Save a copy of the latest database dump that should have that table
backed up.
* 'truncate bounceevent' to wipe it
Thoughts? +1s? counter proposals?
I'd like to do this so the other db01 users stop having problems.
kevin
--
_______________________________________________
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org
Do not reply to spam, report it:
https://pagure.io/fedora-infrastructure/new_issue