On 2013-04-19 at 09:30 +0100, Nigel Metheringham wrote: > Was wondering about our mechanisms for distributing docs. It makes > sense to have a tarball of the HTML in the "ftp" area (quoted, because I > suspect most distribution is other than by ftp nowadays) - as the HTML > is a pile of small files all connected. PDFs, and for that matter ebook > versions etc, are single files (OK, spec & filter) and reasonably > compressed to start with. Would we do better to just unpack them. > In that case the website links become links direct to the distribution > area PDFs.
I think the reason for the compression of single files is that it was set up when PostScript was still dominant. With PDF, I agree that it makes much less sense. epub files are zip files with a different extension and a common file hierarchy, so if they can be compressed better it means we're using a bad zip compressor. I'm definitely in favour of not having to .bz2/.gz the epubs. > Maybe we would need to change directory structure a little - a directory > per version with all the files in rather than flat, although we prune > the online versions into an old directory every so often anyhow... so > maybe this doesn't need doing. > > Hopefully that makes things simpler rather than more complex (OK, the > scripting needs fixing, but it needs fixing now). > > Comments? With links to the files from the web area, we stop being able to move old versions into sub-directories without breaking the links from the old versions of the files, so we're better off having something like: eximdoc/pdf/spec-4.80.pdf eximdoc/pdf/spec-4.80.1.pdf eximdoc/pdf/spec-current.pdf -> spec-4.80.1.pdf eximdoc/epub/spec-4.80.epub eximdoc/epub/spec-4.80.1.epub eximdoc/epub/spec-current.epub -> spec-4.80.1.epub For the source files, it makes sense to move old versions aside so that those who want to have infrastructure that locks in a particular version are responsible for providing access to it and _we_ can move versions with security holes into old/ ASAP and break direct download links. For documentation, absent a developer machine compromise which results in the production of a trojanned document with an exploit in it, which should be handled as an exception case for clean-up, we don't want this and we should just make sure that a versioned link is good for all time. Something else to consider, far less urgent, is metadata in HTTP responses such as "Link: <...>; rel=latest-version". It seems feasible that search engines would detect that and use as input, similarly to how canonical links in web-pages help de-dup, and ensure that folks who search for "exim spec.pdf" will have link ranking pointing them to the latest, with a version number, updating fairly promptly without losing ranking. Eg: curl -I http://people.spodhuis.org/phil.pennock/software/sieve-connect-0.85.tar.bz2 (and old versions 0.81, 0.83, 0.84 exist for comparing/contrasting). -Phil -- ## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
