On 2013-04-19 at 09:30 +0100, Nigel Metheringham wrote:
> Was wondering about our mechanisms for distributing docs.  It makes 
> sense to have a tarball of the HTML in the "ftp" area (quoted, because I 
> suspect most distribution is other than by ftp nowadays) - as the HTML 
> is a pile of small files all connected.  PDFs, and for that matter ebook 
> versions etc, are single files (OK, spec & filter) and reasonably 
> compressed to start with.  Would we do better to just unpack them.
> In that case the website links become links direct to the distribution 
> area PDFs.

I think the reason for the compression of single files is that it was
set up when PostScript was still dominant.  With PDF, I agree that it
makes much less sense.

epub files are zip files with a different extension and a common file
hierarchy, so if they can be compressed better it means we're using a
bad zip compressor.  I'm definitely in favour of not having to .bz2/.gz
the epubs.

> Maybe we would need to change directory structure a little - a directory 
> per version with all the files in rather than flat, although we prune 
> the online versions into an old directory every so often anyhow... so 
> maybe this doesn't need doing.
> 
> Hopefully that makes things simpler rather than more complex (OK, the 
> scripting needs fixing, but it needs fixing now).
> 
> Comments?

With links to the files from the web area, we stop being able to move
old versions into sub-directories without breaking the links from the
old versions of the files, so we're better off having something like:

  eximdoc/pdf/spec-4.80.pdf
  eximdoc/pdf/spec-4.80.1.pdf
  eximdoc/pdf/spec-current.pdf -> spec-4.80.1.pdf

  eximdoc/epub/spec-4.80.epub
  eximdoc/epub/spec-4.80.1.epub
  eximdoc/epub/spec-current.epub -> spec-4.80.1.epub

For the source files, it makes sense to move old versions aside so that
those who want to have infrastructure that locks in a particular version
are responsible for providing access to it and _we_ can move versions
with security holes into old/ ASAP and break direct download links.

For documentation, absent a developer machine compromise which results
in the production of a trojanned document with an exploit in it, which
should be handled as an exception case for clean-up, we don't want this
and we should just make sure that a versioned link is good for all time.

Something else to consider, far less urgent, is metadata in HTTP
responses such as "Link: <...>; rel=latest-version".  It seems feasible
that search engines would detect that and use as input, similarly to how
canonical links in web-pages help de-dup, and ensure that folks who
search for "exim spec.pdf" will have link ranking pointing them to the
latest, with a version number, updating fairly promptly without losing
ranking.

Eg:
  curl -I 
http://people.spodhuis.org/phil.pennock/software/sieve-connect-0.85.tar.bz2
(and old versions 0.81, 0.83, 0.84 exist for comparing/contrasting).

-Phil

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim 
details at http://www.exim.org/ ##

Reply via email to