Herold Heiko <[EMAIL PROTECTED]> writes:

> I think wget needs sometimes (often) to reread what it wrote to the
> disk (html conversion). This means something like that wouldn't
> work, or better, would be to specialized.

In the long run, I hope to fix that.  The first step has already been
done -- Wget is traversing the links breadth-first, which means that
it only needs to read the HTML file once.

The next step would be to allow Wget's reader to read directly into
memory, or to read both into memory and print to stdout.  This way,
things like `wget --spider -r URL' or `wget -O foo -r URL' would work
perfectly.  Alternately, Wget could write into a temporary file, read
the HTML, and discard the file.

I don't see much use for adding the `--tar' functionality to Wget
because Wget should preferrably do one thing (download stuff off the
web), and do it well -- post-processing of the output, such as
serializing it into a stream, should be done by a separate utility --
in this case, `tar'.

On technical grounds, it might be hard to shoehorn Wget's mode of
operation into what `tar' expects.  For example, Wget might need to
revisit directories in random order.  I'm not sure if a tar stream is
allowed to do that.

<vision>
However, it might be cool to create a simple output format for
serializing the result of a Wget run.  Then a converter could be
provided that converts this to tar, cpio, whatever.  The downside to
this is that we would invent Yet Another Format, but the upside would
be that Wget proper would not depend on external libraries to support
`tar' and whatnot.
</vision>

Reply via email to