Re: Standardized way of extracting additional build-time artefacts (was: Re: RFC: Standardizing source package artifacts build paths)

2020-03-11 Thread Mattia Rizzolo
(note, this is a barely structured brain dump)

On Tue, Mar 10, 2020 at 08:10:55AM +0100, Niels Thykier wrote:
> >> Though, can you elaborate a bit on why the above approach would be
> >> better than a standard ENV variable a la AUTOPKGTEST_ARTIFACTS and some
> >> easy way to declare additional artifacts to be extracted?
> > 
> > Mainly, I'd prefer something declarative with glob patterns (a bit like
> > debian/clean or Gitlab-CI's artifacts:paths) rather than having to write
> > logic like these pseudo-patches:

Same.
Having such directory be hidden inside
/build/foo-1.2.3/debian/.build/artefacts/ is kind of a mouthful, but if
that gets to be standardized I think it would be awesome (builders
(sbuild, pbuilder, …) decide on the first '/build/foo-1.2.3/' part of
the path and they know of it; package building happens with CWD in that
place, so build tools should just try to stick to relative paths
'./debian/.build/artefacts/'; everything should Just Work that way).

One thing that strikes me of this proposal, is that you were trying to
"hide" that .build directory from the maintainer; doing this would be
going against that design decision.  This is the only "concern" I have
with the proposal.  Probably this can be avoided by providing a dh_
helper.

> Ack, I get the part of having a declarative mechanism for selecting files.

And then builder could just take out the whole directory.  If that gets
to be (g|x|)zipped or not would be an implementation detail of the
builders (sbuild, pbuilder, …) and of whatever frontend (launchpad,
buildd + wanna-build, …) is used.

> Just to clarify something related.  Should debhelper and other tools by
> default archive "certain files of possible interest" (e.g. config.log)?
> Or should we limit it to "on request only"?

That would be some nice automatism indeed, but I think it's something
for "later".  If you do, please consider these bits:
 * naming the files: you risk clashing with maintainer-set file names
 * deciding on whether to put those files there only on failure or all
   the time

> The former makes it simpler for people that are interested in the
> "default" parts but also bloats the archive for people that are
> interested in just one file.

"bloating" is indeed important.  If we start doing this, frontends need
to decide on a retaining policy.  Do we want maintainers to have a say
on this?  Like, adding a metadata file to the artifacts to indicate any
interest on those files (this is a successful build: keep for x
days/keep until next successful build + y days, etc etc).

> > I don't have any particular opinion on whether artifacts should be
> > collected into debian/.build/artifacts/, into $DPKG_ARTIFACTS/, or
> > directly into some sort of archive (Gitlab and Jenkins seem to use zip
> > files, which have the advantage of being seekable, so web frontends
> > can presumably read individual logs directly out of the zip file if
> > that's desirable).
> 
> Thanks for clarifying.  This answered the question I was trying to write. :)


I think I took care of those thoughts above, but to reiterate:
 * IMHO ./debian/.build/artefacts/ (or artifacts? :P) is a cool and
   accessible place for all interested software
 * perhaps, you could consider using
   ${DPKG_ARTEFACTS:-$PWD/debian/.build/artefacts} so that some builders
   can override the directory if they find it more convenient for some
   reason, but otherwise I'd rather stick to a stable, non-changable
   path.
 * I think eventual tarball/compression should be left as a matter for
   the build driver (sbuild, pbuilder, …).

-- 
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540  .''`.
More about me:  https://mapreri.org : :'  :
Launchpad user: https://launchpad.net/~mapreri  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-


signature.asc
Description: PGP signature


Re: Standardized way of extracting additional build-time artefacts (was: Re: RFC: Standardizing source package artifacts build paths)

2020-03-11 Thread Mattia Rizzolo
On Tue, Mar 10, 2020 at 09:07:57AM +, Simon McVittie wrote:
> On Tue, 10 Mar 2020 at 07:19:59 +, Paul Wise wrote:
> > On Tue, Mar 10, 2020 at 7:12 AM Niels Thykier wrote:
> > > Standardized way of extracting additional build-time artefacts
> > 
> > This reminds me of the BYHAND stuff, I forget how that works though.
[…]
> Similarly, we probably don't want to publish the build products to users
> if the build(-time tests) failed (because we can't be confident that any
> products that were already produced are good), although we might well
> want to make them available through a contributor-oriented interface to
> help to debug the failures; but we do want to publish build and test logs
> to contributors, regardless of success or failure.

And this highlights one important aspect of such interface: such
artifacts would be collected even after a build failure.
That's not possible at all now.

> The .buildinfo file is arguably already in the same category as build
> and test logs. We currently capture it in the .changes file and upload
> it to ftp-master, but it isn't reproducible, and ftp-master doesn't
> republish it through our user-facing interface (the archive). Ideally,
> failed builds would capture their .buildinfo as well as their log for
> subsequent analysis, although I don't know whether they actually do.

That's somewhat of a tough argument, that I'd try to keep separate (it
has to do with the semantic meaning of a .buildinfo (i.e., it tries to
attach a *built artifacts* to the way it was build, a .buildinfo without
any hashes would be quite meaningless when tied to its original meaning.
Also, we do want it to be published, but we are still waiting for the
ftp-masters to tell us their distribution requirements...).

-- 
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540  .''`.
More about me:  https://mapreri.org : :'  :
Launchpad user: https://launchpad.net/~mapreri  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-


signature.asc
Description: PGP signature


Re: Standardized way of extracting additional build-time artefacts (was: Re: RFC: Standardizing source package artifacts build paths)

2020-03-10 Thread Simon McVittie
On Tue, 10 Mar 2020 at 07:19:59 +, Paul Wise wrote:
> On Tue, Mar 10, 2020 at 7:12 AM Niels Thykier wrote:
> > Standardized way of extracting additional build-time artefacts
> 
> This reminds me of the BYHAND stuff, I forget how that works though.

I think how that works is that at the appropriate time during a successful
build, you run dpkg-distaddfile to insert extra entries that are not a
recognised file type (.deb, .udeb etc.) into debian/files?

The difference (as I understand it) is that BYHAND is for extra build
products that are listed in the .changes file and intended to be published
to Debian users via ftp.debian.org; whereas in this thread we're talking
about non-essential things that are produced as a side-effect of the
build, are potentially useful to Debian contributors for debugging and
analysis of the build process itself, but are not actually the "product".

Some important trade-offs are different. For example, for the build
products mentioned in the .changes file (whether .deb or BYHAND) we want
reproducible builds that don't capture unnecessary information like the
properties of the build system; whereas in build and test logs, we *do*
want to capture system-specific information in case it's relevant, for
example to help a Debian contributor to realise correlations like "this
test fails whenever we're building on a btrfs filesystem" that can help
them to find and fix bugs.

Similarly, we probably don't want to publish the build products to users
if the build(-time tests) failed (because we can't be confident that any
products that were already produced are good), although we might well
want to make them available through a contributor-oriented interface to
help to debug the failures; but we do want to publish build and test logs
to contributors, regardless of success or failure.

The .buildinfo file is arguably already in the same category as build
and test logs. We currently capture it in the .changes file and upload
it to ftp-master, but it isn't reproducible, and ftp-master doesn't
republish it through our user-facing interface (the archive). Ideally,
failed builds would capture their .buildinfo as well as their log for
subsequent analysis, although I don't know whether they actually do.

smcv



Re: Standardized way of extracting additional build-time artefacts (was: Re: RFC: Standardizing source package artifacts build paths)

2020-03-10 Thread Simon McVittie
On Tue, 10 Mar 2020 at 08:10:55 +0100, Niels Thykier wrote:
> Just to clarify something related.  Should debhelper and other tools by
> default archive "certain files of possible interest" (e.g. config.log)?
> Or should we limit it to "on request only"?

I think it would probably make most sense for dpkg (which doesn't know
about specific build systems) to not archive anything by default, or to
archive only things it produced itself.

For debhelper it might make sense for build system classes to archive
well-known logs like Autotools' ${builddir}/config.log and Meson's
${builddir}/meson-logs/ by default, but probably not logs that are in
an unpredictable location like Autotools' test logs.

If there's an exclusion mechanism for packages that know a particular
artifact is not useful and monstrously large ("!meson-logs/big.log"?) then
it doesn't necessarily matter much either way. If artifacts aren't kept
forever then the damage from archiving too much will be temporary.

> The former makes it simpler for people that are interested in the
> "default" parts but also bloats the archive for people that are
> interested in just one file.

Have you seen the UI Gitlab-CI and Jenkins provide for this? If you look
at a Gitlab-CI job like ,
there's a Download link that gives you the complete bundle of
artifacts in a zip file, but there's also a Browse link like

that lets you look at individual logs online (which is often enough to
debug an issue). Jenkins has a similar system.

On Salsa, ci.debian.net (for tests with the needs-build restriction),
and similar systems, I think it would make most sense to drop the
artifacts into somewhere the "larger" CI system will pick them up, and
let the "larger" CI system handle browsing and expiry. On
buildd.debian.org, I think build logs are kept forever(?) but artifacts
should probably have some sort of expiration mechanism, similar to the
way ci.debian.net remembers test results indefinitely but discards old
logs after a while.

smcv



Re: Standardized way of extracting additional build-time artefacts (was: Re: RFC: Standardizing source package artifacts build paths)

2020-03-10 Thread Paul Wise
On Tue, Mar 10, 2020 at 7:12 AM Niels Thykier wrote:

>

Standardized way of extracting additional build-time artefacts


This reminds me of the BYHAND stuff, I forget how that works though.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



Standardized way of extracting additional build-time artefacts (was: Re: RFC: Standardizing source package artifacts build paths)

2020-03-10 Thread Niels Thykier
Simon McVittie:
> On Mon, 09 Mar 2020 at 20:45:13 +0100, Niels Thykier wrote:
>> Simon McVittie:
>>> For example, dpkg-buildpackage could perhaps read one glob per
>>> line from debian/artifacts and hardlink matched files (if any) into
>>> debian/.build/artifacts for collection by a "larger" framework like
>>> sbuild or pbuilder, or individual packages could copy/link files into there
>>> as they go, or debhelper build-system classes like Autotools and Meson
>>> could know the names of common log files from their build system, or
>>> some combination of those.
>>
>> Though, can you elaborate a bit on why the above approach would be
>> better than a standard ENV variable a la AUTOPKGTEST_ARTIFACTS and some
>> easy way to declare additional artifacts to be extracted?
> 
> Mainly, I'd prefer something declarative with glob patterns (a bit like
> debian/clean or Gitlab-CI's artifacts:paths) rather than having to write
> logic like these pseudo-patches:
> 
> [...]
> 

Ack, I get the part of having a declarative mechanism for selecting files.

Just to clarify something related.  Should debhelper and other tools by
default archive "certain files of possible interest" (e.g. config.log)?
Or should we limit it to "on request only"?

The former makes it simpler for people that are interested in the
"default" parts but also bloats the archive for people that are
interested in just one file.

> I don't have any particular opinion on whether artifacts should be
> collected into debian/.build/artifacts/, into $DPKG_ARTIFACTS/, or
> directly into some sort of archive (Gitlab and Jenkins seem to use zip
> files, which have the advantage of being seekable, so web frontends
> can presumably read individual logs directly out of the zip file if
> that's desirable).
> 
> smcv
> 

Thanks for clarifying.  This answered the question I was trying to write. :)

~Niels