On Thu, 2016-10-06 at 14:18:27 +0200, Andreas Beckmann wrote:
> On 2016-10-06 13:27, Andreas Tille wrote:
> > On Thu, Oct 06, 2016 at 01:15:43PM +0200, Andreas Beckmann wrote:
> >> >From the attached log (scroll to the bottom...):
> >> 28m45.5s ERROR: FAIL: debsums reports modifications inside the chroot:
> >> debsums: missing file /var/lib/metaphlan2-data/markers.fasta (from
> >> metaphlan2-data package)
> >> (If I run it manually and don't generate the stuff in postinst,
> >> the file stays installed).
> >> A gut feeling says that I would rather expect the shipped file in
> >> /usr/share and the generated files in /var/lib ...
> >> Wrote 304203219 bytes to primary EBWT file:
> >> /usr/share/metaphlan2/db_v20/mpa_v20_m200.rev.1.bt2
> >> Wrote 177889404 bytes to secondary EBWT file:
> >> /usr/share/metaphlan2/db_v20/mpa_v20_m200.rev.2.bt2
> > I need to admit that it is intended to remove the file from users hard
> > disk since its only reason is to create the resulting files and will not
> > be needed afterwards any more. Upstream actually ships the results and
> > to save bandwidth the smaller (and editable text) format fasta is used
> > for the Debian package. This compromise was discussed on debian-devel.
> > It was not discussed whether it is OK to remove the intermediate format
> > afterwards. Could you imagine a solution which does not bloat users
> > harddisk with unused files that does not raise a signal on Debian's QA
> > tools?
> That's an interesting usecase. Feel free to downgrade the severity.
> Guillem, do you have any suggestions how to solve this? In an abstract
> view the package uses a custom compression format and custom
> decompressor for (some of) the files it ships.
This usage would fall partially under:
Having played a bit with the generated files, they do not seem to
compress very well, so I don't see any other option. This is in the
end a matter of a trade-off, between downloaded data and computation
time on each and every system. Personally I'd favor a bigger file and
less time spent on every and each installed system, because the data
will end up occupying that much space on disk anyway, it's not something
downloaded often (I'd assume), and being an arch:all is shared for all
If you are going to still favor the rebuilding at install-time, a couple
of possibly slight improvement might be to exclude the removed files from
the md5sums files generated at package build time, but this will probably
still trigger QA tooling alarms. And try to get a more accurate package
installed size by setting the Extra-Size substvar to compensate for
the difference (man deb-substvars).