On 13/04/21 at 18:45 +0200, Mattia Rizzolo wrote:
> [ Adding lucas@ to CC since he is the main person behind UDD after all ]
> 
> On Sun, Apr 11, 2021 at 12:45:14PM -0700, Felix Lechner wrote:
> > On Sat, May 9, 2020 at 5:33 PM Mattia Rizzolo <mat...@debian.org> wrote:
> > > have lintian decide on a nice machine-parsable (text!) format
> > > then udd will adapt its importer.
> > 
> > As you know, both of these already happened several months ago.
> 
> Indeed, I consider that done by now.
> 
> > I have
> > not commented here because I am still chewing on a related, but much
> > harder problem:
> 
> I'd have probably used a different bug, but guess we'll cope.
> 
> > Lintian will soon cease to run blindly across the archive and instead
> > produce packaging hints on demand, as uploads are received by the
> > archive. There is no batch process anymore that will produce files for
> > the entire archive the way you expect. Instead, Lintian's new website
> > https://lintian.debian.*net* offers a JSON interface [1] to get up to
> > date information similar to DAKweb. [2]
> 
> So, if we really go down this route, I think we need to:
> 
> * Have the importer able to run a full import of everything, which means
>   looping through all sources (which means running some ~30k HTTP GETs)
>   and storing them.
> * Figure out a way for UDD to know it needs to check the status of a
>   package.  This likely means a job that compares the set of known
>   (package, version, suite) (is the tuple right?) with what is available
>   in the lintian table: if something is missing query the lintian
>   website for new data.
> * perhaps have the lintian website *push* new data to udd.d.o.  I'm
>   conflicted if this should be just a trigger ("hey I've just processed
>   this, check it out yourself") or if it should carry the actual data as
>   well.  I'm sure you'd like a HTTP post or such, but I can tell you
>   that we'd likely prefer something through SSH.
> 
> 
> Since after all you did look at udd several times, I believe you should
> already be able to implement the first 2?
> 
> 
> 
> All this said, I still don't understand why you wouldn't be able to
> provide a view of everything.  Since you set up that API, couldn't you
> have a endpoint with *all* packages and everything, like the current
> dump?  That sounds much more trivial than what you are proposing…

>From the UDD point of view, I would very much prefer to get a full dump
something I can import every few hours, than having to deal with a
stream of updates or with querying a per-package API.

Currently the full import (that runs twice a day) takes about 10 minutes
(and I don't remember if it has been optimized, so there might be space
for improvement).

Lucas

Reply via email to