Since we are discussing how to handle data, including many documents. Why not use something similar to FreeBSD's ports? That is, we provide a utility that will download the data from its source (using a link that we provide to an ftp archive somewhere), check its md5sum, extract the data, and install it in the appropriate place in the filesystem. In addition, as a service to our users, we can mirror much of this data on a Debian ftp server (and this data can also be distributed on CDs), but such a mirror is not necessary.
Such a mechanism has two advantages: (1) we can provide support for much more data than we can mirror ourselves, thereby saving space in our data collection for the really important stuff, and (2) it is far easier to maintain a reference (link) to something than to repackage and mirror it, especially in terms of network and disk resources. Brian

