On 13 Nov 2004 at 15:53, Gary Griswold wrote:
> A university in a remote location has a 33kbps UUCP connection to the
> Internet.  Because their connection to the internet includes one step that
> is UUCP, they are unable to use HTTP, but use accmail services, such as
> www4mail, agora, emailweb, pagegetter.  For curriculum content purposes they
> would like to obtain a local copy of some specific websites.  Getting pages
> one at a time is very slow, because of their 33kbps connection.  If they had
> a local copy of specific websites they wished to use in course content, they
> would be able to obtain a good response.

I tried to do something similar a few years ago while working at a
University in a country with very poor telecom infrastructure. We had access
to the web, but in practice it was unusable, especially in the rainy season.
ACCMAIL methods were the only reliable way to get web pages with important
content, but it is not possible to reconstruct entire usable websites that
way.

Solution:
Find an external collaborator with a shell account. Use the shell account to
grab websites, then package them for email to the remote university.

Most of it can be done with a Unix shell script ...
1. Use wget with flags -rkp to download a browsable website,
   eg: wget -rkp http://domain.org/
2. Archive the website with tar + gzip (or zip),
   eg: zip -r domain.org.zip domain.org
3. Use split with flag -b to break the archive into email-able chunks, each
   with an identifiable name and sequence suffix,
   eg: split -b 32k domain.org.zip domain.org_20041116_
4. Use mpack to email each chunk separately as an attachment
5. Reassemble the chunks at the remote university. The method used depends
   on who (or what) receives the emails. Ideally they should be filtered
   to a local script, but a human being can also do it.

The main problems are those which would be problems for any other ACCMAIL
method -- cookies, broken dynamic content, browser sniffing, etc. However,
on the whole, wget does a good job.

We never tried to automate the process, eg: to accept email requests for new
websites or to refresh existing sites, but it would not have been very
difficult.

I would recommend that you inform the owners or adminstrators of your target
websites. Tell them what you are doing and why. Some of them might package
their website for you.


--
szs `at` szs `dot` net

----------------------------------------------------------------------
To contribute to the discussion, email to [EMAIL PROTECTED]
To unsubscribe, email to the *admin* address [EMAIL PROTECTED]
with UNSUBSCRIBE ACCMAIL as the message body.
To get the latest version of the ACCMAIL FAQ, send a blank email to
accmail.faq.en `AT` szs.net (replacing `AT` with @ to form a proper
email address).
----------------------------------------------------------------------

Reply via email to