Re: add tar option

2002-04-24 Thread Hrvoje Niksic

Herold Heiko [EMAIL PROTECTED] writes:

 But if I understand this correctly (sorry, sources not checked, foot
 in mouth ecc.) with -k wget still needs to correct the html files
 later, when it knows what has been downloaded and what not. So it
 can't print the file as soon as downloaded, only at the end.

You are correct.  `-k' would be at odds with any kind of streaming
simply because it needs to process all the HTMLs after the fact.
Regardless of other things, that breaks the streaming.

 On technical grounds, it might be hard to shoehorn Wget's mode of
 operation into what `tar' expects.  For example, Wget might need to
 revisit directories in random order.  I'm not sure if a tar stream
 is allowed to do that.

 A simple |sort should fix that,

You're misunderstanding me.  In my thought experiment, I meant that
Wget output might be a tar stream itself, not a list of file names to
feed to `tar'.

 I agree with the idea do one thing and do it well, after all we
 are not talking about a windows gui try-to-do-everything program
 here. Either I did not understand you correctly, or a simple list of
 files should be enough for every case.

You didn't understand me, but you proposed something far better.  Yes,
something equivalent to `find''s `-print'/`-print0' would actually
help the original poster.

 Or did you mean something else with serialization of the result ?

A tar stream is an example of serialization of a set of files and
directories.  It turns an on-disk structure into a stream of bytes
that can be transferred over a pipe or a network in order to re-create
something resembling the original structure.



RE: add tar option

2002-04-23 Thread Herold Heiko

I think wget needs sometimes (often) to reread what it wrote to the disk
(html conversion). This means something like that wouldn't work, or better,
would be to specialized.

What would work better is a (sometimes requested in the past) switch to
output to a file a list of everything retrieved (or better everything saved
to disk), then you could use that (for example as input to cpio or whatever
you prefer).

Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907472
-- ITALY

 -Original Message-
 From: Max Waterman [mailto:[EMAIL PROTECTED]]
 Sent: Monday, April 22, 2002 10:54 PM
 To: [EMAIL PROTECTED]
 Subject: RFE:add tar option
 
 
 Hi,
 
 I recently had need to pipe what wget retrieved through a 
 command before 
 writing to disk. There was no way I could do this with the 
 version I had.
 
 What I would like to wget to do is to create a tar stream of 
 the files 
 and directories it is downloading and send that to stdout, 
 kind of like :
 
 tar -cvvf - files...
 
 then I could pipe that into whatever I wanted, for example :
 
 $ wget -r -l 3 --tar 'http://www.sgi.com/' | other commands | 
 tar -xvvf -
 
 Anyone think this is a good idea?
 
 Please 'cc' me, since I am not on the email list.
 
 Thanks.
 
 Max.
 



Re: add tar option

2002-04-23 Thread Hrvoje Niksic

Herold Heiko [EMAIL PROTECTED] writes:

 I think wget needs sometimes (often) to reread what it wrote to the
 disk (html conversion). This means something like that wouldn't
 work, or better, would be to specialized.

In the long run, I hope to fix that.  The first step has already been
done -- Wget is traversing the links breadth-first, which means that
it only needs to read the HTML file once.

The next step would be to allow Wget's reader to read directly into
memory, or to read both into memory and print to stdout.  This way,
things like `wget --spider -r URL' or `wget -O foo -r URL' would work
perfectly.  Alternately, Wget could write into a temporary file, read
the HTML, and discard the file.

I don't see much use for adding the `--tar' functionality to Wget
because Wget should preferrably do one thing (download stuff off the
web), and do it well -- post-processing of the output, such as
serializing it into a stream, should be done by a separate utility --
in this case, `tar'.

On technical grounds, it might be hard to shoehorn Wget's mode of
operation into what `tar' expects.  For example, Wget might need to
revisit directories in random order.  I'm not sure if a tar stream is
allowed to do that.

vision
However, it might be cool to create a simple output format for
serializing the result of a Wget run.  Then a converter could be
provided that converts this to tar, cpio, whatever.  The downside to
this is that we would invent Yet Another Format, but the upside would
be that Wget proper would not depend on external libraries to support
`tar' and whatnot.
/vision



Re: add tar option

2002-04-23 Thread Ian Abbott

On 23 Apr 2002 at 18:19, Hrvoje Niksic wrote:

 On technical grounds, it might be hard to shoehorn Wget's mode of
 operation into what `tar' expects.  For example, Wget might need to
 revisit directories in random order.  I'm not sure if a tar stream is
 allowed to do that.

You can add stuff to a tar stream in a pretty much random order -
that's effectively what you get when you use tar's -r option to
append to the end of an existing archive.  (I used to use that with
tapes quite often, once upon a time.)