On 13/04/21 at 18:45 +0200, Mattia Rizzolo wrote: > [ Adding lucas@ to CC since he is the main person behind UDD after all ] > > On Sun, Apr 11, 2021 at 12:45:14PM -0700, Felix Lechner wrote: > > On Sat, May 9, 2020 at 5:33 PM Mattia Rizzolo <mat...@debian.org> wrote: > > > have lintian decide on a nice machine-parsable (text!) format > > > then udd will adapt its importer. > > > > As you know, both of these already happened several months ago. > > Indeed, I consider that done by now. > > > I have > > not commented here because I am still chewing on a related, but much > > harder problem: > > I'd have probably used a different bug, but guess we'll cope. > > > Lintian will soon cease to run blindly across the archive and instead > > produce packaging hints on demand, as uploads are received by the > > archive. There is no batch process anymore that will produce files for > > the entire archive the way you expect. Instead, Lintian's new website > > https://lintian.debian.*net* offers a JSON interface [1] to get up to > > date information similar to DAKweb. [2] > > So, if we really go down this route, I think we need to: > > * Have the importer able to run a full import of everything, which means > looping through all sources (which means running some ~30k HTTP GETs) > and storing them. > * Figure out a way for UDD to know it needs to check the status of a > package. This likely means a job that compares the set of known > (package, version, suite) (is the tuple right?) with what is available > in the lintian table: if something is missing query the lintian > website for new data. > * perhaps have the lintian website *push* new data to udd.d.o. I'm > conflicted if this should be just a trigger ("hey I've just processed > this, check it out yourself") or if it should carry the actual data as > well. I'm sure you'd like a HTTP post or such, but I can tell you > that we'd likely prefer something through SSH. > > > Since after all you did look at udd several times, I believe you should > already be able to implement the first 2? > > > > All this said, I still don't understand why you wouldn't be able to > provide a view of everything. Since you set up that API, couldn't you > have a endpoint with *all* packages and everything, like the current > dump? That sounds much more trivial than what you are proposing…
>From the UDD point of view, I would very much prefer to get a full dump something I can import every few hours, than having to deal with a stream of updates or with querying a per-package API. Currently the full import (that runs twice a day) takes about 10 minutes (and I don't remember if it has been optimized, so there might be space for improvement). Lucas