Re: Freeze Break request: save and then truncate the mailman bounceevnt table

Michal Konecny via infrastructure Wed, 09 Apr 2025 01:08:01 -0700

+1 from me

The bounceevent table also hit me during the migration to mailman3, butthere were instruction how to skip that step and mark all that werealready in the db processed. I assume that processed bounce eventsshould be deleted from DB after some time as a permanent solution, likeonce per year or so.


On 4/8/25 23:41, Kevin Fenzi via infrastructure wrote:

Greetings.

We have had several applications crashing (resultsdb,
resultsdb_ci_listener) or being slow (bodhi) of late.

I did some digging today and discovered that db01 is pretty saturated on
I/O. This means all the apps that use db01 are fighting i/o and
returning things slower than they should.

On looking more, it was mailman that was using the vast amount of the
i/o. I of course thought at first that it was crawlers, but it is not.

Instead it seems to be the bounce processor.
This processor wakes up every few minutes and does a query for any
bounces in the bounceevent table that are processed = 'false'.
If it finds any, it processes them.

However, that table is now 50GB and contains 152167015 rows
(all of them pretty much processed = 'True').

 From the logs (which logs slow queries), an example:

2025-04-08 21:32:40.510 GMT [7073] LOG:  duration: 267423.928 ms  plan:
         Query Text: SELECT bounceevent.id AS bounceevent_id, 
bounceevent.list_id AS bounceevent_
list_id, bounceevent.email AS bounceevent_email, bounceevent.timestamp AS 
bounceevent_timestamp,
  bounceevent.message_id AS bounceevent_message_id, bounceevent.context AS 
bounceevent_context, b
ounceevent.processed AS bounceevent_processed
         FROM bounceevent
         WHERE bounceevent.processed = false
         Gather  (cost=1000.00..7441540.83 rows=1 width=137)
           Workers Planned: 2
           ->  Parallel Seq Scan on bounceevent  (cost=0.00..7440540.73 rows=1 
width=137)
                 Filter: (NOT processed)

Yes, thats 267seconds to process that query, all the time hammering I/O
because the table is too large to cache well.

This all pointed me to find this 7 year old bug report:
https://gitlab.com/mailman/mailman/-/issues/343
Hopefully abompard finds it a fun blast from the past. :)

Anyhow, a quick fix I think would be:

* Save a copy of the latest database dump that should have that table
backed up.
* 'truncate bounceevent' to wipe it

Thoughts? +1s? counter proposals?

I'd like to do this so the other db01 users stop having problems.

kevin


--
_______________________________________________
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Re: Freeze Break request: save and then truncate the mailman bounceevnt table

Reply via email to