Hi Sébastien,

On 08/06/26 at 21:37 +0200, Sebastien Bacher wrote:
> Hey Lucas,
> 
> Indeed, launchpad is still struggling with AI scrappers and similar and the
> number of requests UDD is making has led to the IP being blocked again. It
> is unblocked now,

Thanks!

> but I think we need figure out a way for it to not hammer
> launchpad that hard on a regular basis (if my previous quick check is
> correct is does read 160k+ pages every run). Was there any technical reason
> to not use launchpad API (which would allow to filter on recent changes and
> only process bugs that changed recently instead of hammering every
> launchpad bug page at every run)?

Most of that code was written in 2008, so I don't remember the design
choices from back then. Maybe the launchpad API wasn't ready for that
back then.

Still, the main reason for re-importing all bugs every time is that it's
easier to ensure data correctness that way. Once you try to refresh only
things that change, you increase your chances of running into corner
cases...

I would welcome a patch to re-implement the code using the launchpad
API, but I'm unlikely to find time+motivation to work on it myself.

In the meantime, if that helps, I can reduce the number of parallel
workers from 2 to 1 -- I reorganized the process to split it into a
download phase, and an SQL INSERT phase, so if the download phase takes
several days, it's no longer a problem (previously the long running
transaction caused a problem because it prevented VACUUMing).

Lucas

Reply via email to