Re: add tar option
Herold Heiko [EMAIL PROTECTED] writes: But if I understand this correctly (sorry, sources not checked, foot in mouth ecc.) with -k wget still needs to correct the html files later, when it knows what has been downloaded and what not. So it can't print the file as soon as downloaded, only at the end. You are correct. `-k' would be at odds with any kind of streaming simply because it needs to process all the HTMLs after the fact. Regardless of other things, that breaks the streaming. On technical grounds, it might be hard to shoehorn Wget's mode of operation into what `tar' expects. For example, Wget might need to revisit directories in random order. I'm not sure if a tar stream is allowed to do that. A simple |sort should fix that, You're misunderstanding me. In my thought experiment, I meant that Wget output might be a tar stream itself, not a list of file names to feed to `tar'. I agree with the idea do one thing and do it well, after all we are not talking about a windows gui try-to-do-everything program here. Either I did not understand you correctly, or a simple list of files should be enough for every case. You didn't understand me, but you proposed something far better. Yes, something equivalent to `find''s `-print'/`-print0' would actually help the original poster. Or did you mean something else with serialization of the result ? A tar stream is an example of serialization of a set of files and directories. It turns an on-disk structure into a stream of bytes that can be transferred over a pipe or a network in order to re-create something resembling the original structure.
RE: add tar option
I think wget needs sometimes (often) to reread what it wrote to the disk (html conversion). This means something like that wouldn't work, or better, would be to specialized. What would work better is a (sometimes requested in the past) switch to output to a file a list of everything retrieved (or better everything saved to disk), then you could use that (for example as input to cpio or whatever you prefer). Heiko -- -- PREVINET S.p.A.[EMAIL PROTECTED] -- Via Ferretto, 1ph x39-041-5907073 -- I-31021 Mogliano V.to (TV) fax x39-041-5907472 -- ITALY -Original Message- From: Max Waterman [mailto:[EMAIL PROTECTED]] Sent: Monday, April 22, 2002 10:54 PM To: [EMAIL PROTECTED] Subject: RFE:add tar option Hi, I recently had need to pipe what wget retrieved through a command before writing to disk. There was no way I could do this with the version I had. What I would like to wget to do is to create a tar stream of the files and directories it is downloading and send that to stdout, kind of like : tar -cvvf - files... then I could pipe that into whatever I wanted, for example : $ wget -r -l 3 --tar 'http://www.sgi.com/' | other commands | tar -xvvf - Anyone think this is a good idea? Please 'cc' me, since I am not on the email list. Thanks. Max.
Re: add tar option
Herold Heiko [EMAIL PROTECTED] writes: I think wget needs sometimes (often) to reread what it wrote to the disk (html conversion). This means something like that wouldn't work, or better, would be to specialized. In the long run, I hope to fix that. The first step has already been done -- Wget is traversing the links breadth-first, which means that it only needs to read the HTML file once. The next step would be to allow Wget's reader to read directly into memory, or to read both into memory and print to stdout. This way, things like `wget --spider -r URL' or `wget -O foo -r URL' would work perfectly. Alternately, Wget could write into a temporary file, read the HTML, and discard the file. I don't see much use for adding the `--tar' functionality to Wget because Wget should preferrably do one thing (download stuff off the web), and do it well -- post-processing of the output, such as serializing it into a stream, should be done by a separate utility -- in this case, `tar'. On technical grounds, it might be hard to shoehorn Wget's mode of operation into what `tar' expects. For example, Wget might need to revisit directories in random order. I'm not sure if a tar stream is allowed to do that. vision However, it might be cool to create a simple output format for serializing the result of a Wget run. Then a converter could be provided that converts this to tar, cpio, whatever. The downside to this is that we would invent Yet Another Format, but the upside would be that Wget proper would not depend on external libraries to support `tar' and whatnot. /vision
Re: add tar option
On 23 Apr 2002 at 18:19, Hrvoje Niksic wrote: On technical grounds, it might be hard to shoehorn Wget's mode of operation into what `tar' expects. For example, Wget might need to revisit directories in random order. I'm not sure if a tar stream is allowed to do that. You can add stuff to a tar stream in a pretty much random order - that's effectively what you get when you use tar's -r option to append to the end of an existing archive. (I used to use that with tapes quite often, once upon a time.)