Hi guys, After some discussions with Steve (Sledge) and Thomas Schmitt (xorriso upstream) a new approach was suggested to perform the image size estimation task within debian-cd (idea originally suggested by Thomas).
Currently it all boils down to forking genisoimage -print-size multiple times (as much tries as it gets) as the addition algorithm approaches close to the media end (5-10% of configured media size). As already known, spawning genisoimage from the ground up to perform such size estimation tasks on relatively large data sets is quite expensive. This could be alleviated by initiating a session with xorriso in dialog mode, and talk to it via stdin+stdout (forked just once until the real end of media is reached). File objects could be mapped into the ISO model, and several estimation algorithms applied, eventually followed final -print_size calls to achieve exactness of the estimated image size. To explore that, a proof of concept tool has been started at: http://git.debian.org/?p=users/danchev/medistimator.git;a=summary (see 'algo' branch) Some (perhaps naive) timing results could be found at: http://people.debian.org/~danchev/medistimator/log/ Whether this tool will be released or not is not of great importance, it main objective is to give impression about: * Impact of aggressively using (or resp. avoiding) -print_size command as compared to alternative estimation techniques which do not rely on expensive operations on the back-end side (xorriso in that case) * How reliable is the interaction between xorriso dialog mode and perl's IPC::Run. My findings so far, reveal positive results, or at least I think so. Three estimation algorithms are explored: * swift - xorriso is only used to calculate ISO image overhead, the rest is a self-made size estimation of the input data objects (files, directories, etc). Approximate (never overruns), but fastest. * psize - relies solely on xorriso -print_size command to perform size calculations, which is expensive and slow. It is included mainly for comparison purposes. Accurate, but very slow. * mixed - Employes both of the above algorithms, swift for speed and prize for exactness. The general idea is to use 'swift' as long as we are not close to the media end, and fall back to 'psize' when it is time to be precise. Accurate and fast (default); If such an approach is found to be beneficial for debian-cd [1] job, then we can start discussing how to transplant code blocks from the proof of concept tool into debian-cd scripts. This is mainly a set of less then ten routines implementing xorriso communication layer, which performs the queries and processes returned results. Maybe it is worth to introduce a divergence from debian-cd Makefile, and create an alternative target which calls modified make_disc_trees.pl which in turn is based on xorriso communication for image size estimations, so that the old approach remains too. My idea is to only give alternative to the size measuring approach, not to change the core logic behind how the debian image trees are laid. [1] I'm not aware of any other vendor here on Earth, producing such an insane amount of images on a weekly basis, like Debian does. -- pub 4096R/0E4BD0AB <people.fccf.net/danchev/key pgp.mit.edu> -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected] Archive: http://lists.debian.org/[email protected]

