Hi Baptiste,

On 07/08/23 at 22:07 +0200, Baptiste Beauplat wrote:
> Hi Lucas,
> 
> On 2023-08-03 10:30, Lucas Nussbaum wrote:
> > duck-as-a-service (duck.debian.net) has been broken for a long time,
> > and
> > the corresponding UDD importer is broken as well (see #949009,
> > #963887).
> > In the meantime, duck continued evolving (was rewritten?) and is now
> > checking a lot more places for URLs.
> > 
> > It would probably be useful to re-create a way to provide duck
> > results
> > as a service, based on UDD, similarly to what is done for upstream or
> > lintian data.
> > 
> > Ideally, this would be done in cooperation with the duck maintainer
> > to
> > do the following changes:
> > - in duck, separate the logic to get URLs from sources, from the
> > logic
> >   to check those URLs (for example, allow dumping a list of URLs, and
> >   also using a list of URLs as source)
> > - in duck, provide machine-readable outputs (JSON?)
> 
> Currently duck has two features which can help us:
> 
> - The `-n` switch, which gets all URLs and prints them to stdout
> - The `-l filename` switch, which takes a file with one URL per line
> and checks them
> 
> Theoretically, what's missing in only a `--json` switch, which would
> change the output from console/text to JSON.
> 
> But, as I see it, the `-l` argument is limited in two aspects:
> 
> - It provides only the URL, loosing the checker type which is used to
> select what kind of validation will be performed.
> 
>   For instance, a https://salsa.debian.org/rfrancoise/tmux.git of type
> VCS-Git would be tested as a standard URL in the `-l` context, instead
> of a git repository.
> 
> - It requires a file
> 
> I'm thinking of implementing a new JSON specific input format
> (`--input-json`?), including the two information, which would read from
> stdout instead of a file.
> 
> The format would be as simple as:
> 
> ```json
> [
>    {"type": "VCS-Git",
>     "url": "https://salsa.debian.org/rfrancoise/tmux.git";,
>     "filename": "debian/control",  # optional key
>     "line_number": 10},            # optional key
>    ...
> ]
> ```
> 
> Following this logic, the output format for checking URLs would be the
> same, as to have `duck --json -n | duck --input-json` working.
> 
> The JSON result would hold an additional dictionary for each URL
> entries
> named "result", described as follows:
> 
> ```json
> [
>    {"type": "VCS-Git",
>     "url": "https://salsa.debian.org/rfrancoise/tmux.git";,
>     "filename": "debian/control",  # optional key
>     "line_number": 10,             # optional key
>     "result": {
>        "state": 0,  # 0 for OK, 1 for Error, 2 for Information
>        "detail": "Informative message",
>        "certainty": "possible"     # optional key
>    }},
>    ...
> ]
> ```
> 
> Let me know what you think of it.

That would be perfect!

In the context of UDD, I will probably implement that as two tables:
- one to store the mapping between source packages and urls
  (source, version, url, type, filename, line_number)
  which would be updated when a new source version gets uploaded
- one to store the status of urls
  (url, type, result, timestamp of last check)
  which would be updated with a retry policy to be defined

I would not use (filename, line_number) in the input of the URL
testing part.
The reason for that design is that it will easily allow to gather the
status for several versions of the package (testing + unstable +
experimental for example), while not duplicating the checks for URLs.

Lucas

Reply via email to