On Fri, Aug 6, 2010 at 9:53 AM, Alexander Sack <a...@linaro.org> wrote: > Hi, > > On Fri, Aug 6, 2010 at 3:28 AM, Christian Robottom Reis <k...@linaro.org> > wrote: >> >> Hi there! >> >> I unpacked our minimal release image and ran an xdiskusage on it, >> mostly to see what we're shipping -- and I was surprised to see that a >> fourth of the image is actually apt package caches and lists. Can we >> put into the image generation script something to strip them out before >> generating the image? > > if there are really .deb's shipped in the tarball then this is definitly > waste and a bug. > > However, if its just the lists and pkg cache then I am not so convinced > unless we say we > remove apt (and dpkg) from our images (e.g. dont allow easy install/upgrade > etc.). > > Those files would come back when running apt-get update etc., so the only > thing we would win is smaller initial download bandwidth, while I think we > are really after > general/lasting disk foodprint savings.
We could remove these files, but I agree it may be a false optimisation: the size of the release filesystem is no longer representative of the steady-state size of the filesystem when it's in use in this case. Out of interest, does anyone know why dpkg/apt never migrated from the "massive sequential text file" approach to something more database-oriented? I've often thought that the current system's scalability has been under pressure for a long time, and that there is potential for substantial improvements in footprint and performance - though the Debian and Ubuntu communities would need to give their support for such an approach, unless we wanted to switch to a different packaging system. > One thing we could do is remove universe from our default apt line. this > probably would > reduce the size of that directory by > 50% ... > > Long term we could have our own archive with less packages ... this could > reduce size > of those indexes etc. even further. > >> >> The untarring also suggests a number of places where we could further >> trim the image, some of which are probably pretty hard to do: >> >> * stripping /usr/share/doc out (but everybody knew that) > > ack. we plan to do that using pitti's dpkg improvements; last time they > didn't land > in the archive yet, but I will check the status soon again. It's interesting to note that due to the fact that /usr/share/doc contains mostly nearly-empty directories and tiny files, the filesystem overhead may be a significant part of the overall consumption here - I estimate about 20-30% of the overall space, assuming a typical filesystem with 4KB blocksize. If we have to keep /usr/share/doc/ (for copyright notices and so on), maybe it would be feasible to replace each /usr/share/doc/<package>/ with a tarball? This would eliminate most of the overhead as well as making the actual data smaller. Since /usr/share/doc/ is not accessed often, and not accessed by many automated tools, this might not cause much disruption. [...] >> >> * stripping out modules for devices that won't ever be on >> this ARM device > > yeah, this feels to make sense. However, I am not sure how to draw the line. > Maybe this is something the kernel WG can take a look at and come up with a > reduced list of modules? Classifying drivers by bus, and throwing out anything that can't be physically connected, such as PCI/AGP/ISA might be an approach here. Also, peripherals which can only be connected to on-SoC buses, but are not present in a given platform's silicon could be excluded. We would still have to keep a lot though... anything which can be connected via USB, for example. A more ambitious solution might be to allow for dynamic installation of missing modules, but that's probably a separate project since it would impact on the way the kernel is packaged. Currently we have no choice but to install absolutely everything "just in case" (much like the way /dev used to contains 1000s if device nodes that were never used). Cheers ---Dave _______________________________________________ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev