Re: Renovating debbugs (was Re: Interesting learnings about Guix contributor dynamics that apply to Debian?)

Colin Watson Thu, 29 May 2025 05:16:32 -0700

On Thu, May 29, 2025 at 01:03:56PM +0200, Julien Plissonneau Duquène wrote:

Right now we have an implementation that is dated but mostly works soI think that there is no need to rush a move. Working on it for awhile and experimenting with the real data in there will certainlyhelp with figuring out what could be done about the storage.
That data is somewhat transactional, moderately relational, also withrelations like bugs that are merged with bugs, block bugs, affectpackages which are dependencies of packages, are found in versionswhich have versions as successors ... and SQL is not that great atrecursivity or working with graphs. Or handling large binary objects.Or doing finely tunable full-text indexing and search.

People underestimate PostgreSQL: I've done all these things in it ortheir very near equivalents, and it works very well in practice. Mostof them aren't even hard. You have to bend your brain a bit to handlerecursion and graphs, but it's totally doable and usually far fasterthan doing the equivalent on the application side. And even if youdon't bother tuning its dictionaries or similar, its full-text indexingis a perfectly fine starting point - I doubt that debbugs would needmore than very minor tuning.

Admittedly for large binary objects I sometimes use some other kind ofstorage. That's likely unnecessary for bug messages, but I couldimagine it being worthwhile for bug attachments. But honestly, the BTSdoesn't have enough data or a high enough rate of data change to reallybe a problem; in my last job I ran a PostgreSQL database that wassomething like six times the size of bugs.debian.org's entire data set,even leaving aside its much bigger data store for large objects. At ourscale, we could just stuff bug attachments into bytea columns in aseparate table, maybe mark them "SET STORAGE EXTERNAL" for betterstreaming support, give bug messages a one-to-many relationship to them,and it'd be fine. Or we could even just start by storing the raw formof each bug message in the DB and leaving it up to the application toparse it for display purposes; that would be fine for what debbugs doestoday, although a bit less ideal for future full-text search sincesearches probably wouldn't want to match on most email headers.

(PostgreSQL limits bytea columns using TOAST to 1 GiB. The largest .logfile for any bug in bugs.debian.org is for #599476, which is about 204MiB - and that's for the entire bug, with several video files asattachments. So this isn't even close to being a limit of any concern.)

The BTS today has a bunch of difficult-to-follow workarounds that wouldcompletely evaporate if backed by a proper database, allowing muchhigher performance and leaving more effort available for useful things.While I don't want to put words in Don's mouth, from the fact that hewas working on a PostgreSQL port a while ago I infer that he probablyagrees with me on this.


--
Colin Watson (he/him)                              [cjwat...@debian.org]

Re: Renovating debbugs (was Re: Interesting learnings about Guix contributor dynamics that apply to Debian?)

Reply via email to