Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 L Walsh wrote: > Micah Cowan wrote: >> I'm not sure what you mean about the linux thing; there are many >> instances of runtime loadable modules on Linux. dlopen() and friends are >> the standard way of doing this on any Unix kernel flavor. > > I _thought_ so, but when I asked a distro why they didn't > use this, they said it would require rewriting nearly all currently > existing applications. > > My specific complain was against a SuSE distro, that in > in order to load one.rpm, it depended on two.rpm, which depended on > three.rpm, and that on four.rpm, etc. The functionality in "two.rpm" > was to load a library to handle "active directories" which, in my > non-MS, small setup, I didn't need -- and I didn't want to load > the 5-7 supporting packages for AD, since I didn't use them. > BUT, because of static-run time loading, one.rpm would fail if two.rpm > wasn't loaded...and so on and so forth. AFAIK, the same problem > exists on nearly every distro -- because no one bothers to think > that they might not want to load every package on the CD, just > to support local host lookup using...say nscd. G. Ah, well, that's a different situation. In order to decide at runtime whether to load a runtime library or not, dlopen() is the standard way to handle that. However, if the application wasn't designed to make the decision at runtime, but rather at build time, then it does require code rewriting. In this case, though, we're specifically talking about loadable modules. We might choose to allow some of them to be linked at build time, but we'd definitely have to at least support conditional linking at runtime. >> Keeping a single Wget and using runtime libraries (which we were terming >> "plugins") was actually the original concept (there's mention of this in >> the first post of this thread, actually); > --- > Sounds good to me! :-) > >> the issue is that there are >> core bits of functionality (such as the multi-stream support) that are >> too intrinsic to separate into loadable modules, and that, to be done >> properly (and with a minimum of maintenance commitment) would also >> depend on other libraries (that is, doing asynchronous I/O wouldn't >> technically require the use of other libraries, but it can be a lot of >> work to do efficiently and portably across OSses, and there are already >> Free libraries to do that for us). > - > And perhaps that is the problem. In order to re-use existing > parts of code, rather than adopted them to a "load-if-necessary" type > structure -- everyone prefers to just use them "as is", thus one lib > references another, and another...and so on. Like I think you pull > in "cat", and you get all of the gnu-language libs and tools, which > pulls in alternate character set support, which requires certain > font rendering packages -- and of course, if you are display > alternate characters, lets not forget the corresponding foreign > input methods, and the asian-char specific terminal emulators...etc. That's retarded. Native Language Support for a terminal program shouldn't pull in font-rendering packages: displaying the characters properly is the terminal's responsibility. I have some trouble believing that any packagers would actually have such dependencies, but if they do, it's retarded. A program like "cat" should depend only on the system library, and (if NLS is supported) gettext (which shouldn't depend on anything else). > Can I jump off a cliff yet?...ARG! I hack around such problems, > at times, by extracting the 1 run-time library I need, and not the > rest of the package, but then my rpm-verify checks turn up supposed > "errors" because I'm missing package dependencies. Sigh... Frustrating experiences with RedHat's package management is why I'm now a Debian/Ubuntu user. :) > If one wanted to add multi-stream support, couldn't the > "small wget" have a check to see if the multi-stream support lib > was present (or not), and if so, set max-streams equal to one that > might yield the basic behavior one might want for the small wget? Well, but the actual support for having any sort of multi-stream is a major rewrite of the entire I/O code. Much better to use a separate library for that, if we can get it. In that case, it stops being something we can simply check for and use if it's available, but something that the code would absolutely require. > Not pushing a particular solution -- I, like you, am just throwing > out ideas to consider...if they've already covered the points I've > raised, feel free to just ignore my ramblings and "carry on"...:-) Well, and fortunately we've got plenty of time to talk about these things: my focus right now is on getting 1.11 out the door, after which there are _plenty_ of things to keep me busy for 1.12 (still a "lite" release) for quite some time. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://mic
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
Micah Cowan wrote: I'm not sure what you mean about the linux thing; there are many instances of runtime loadable modules on Linux. dlopen() and friends are the standard way of doing this on any Unix kernel flavor. I _thought_ so, but when I asked a distro why they didn't use this, they said it would require rewriting nearly all currently existing applications. My specific complain was against a SuSE distro, that in in order to load one.rpm, it depended on two.rpm, which depended on three.rpm, and that on four.rpm, etc. The functionality in "two.rpm" was to load a library to handle "active directories" which, in my non-MS, small setup, I didn't need -- and I didn't want to load the 5-7 supporting packages for AD, since I didn't use them. BUT, because of static-run time loading, one.rpm would fail if two.rpm wasn't loaded...and so on and so forth. AFAIK, the same problem exists on nearly every distro -- because no one bothers to think that they might not want to load every package on the CD, just to support local host lookup using...say nscd. G. Keeping a single Wget and using runtime libraries (which we were terming "plugins") was actually the original concept (there's mention of this in the first post of this thread, actually); --- Sounds good to me! :-) the issue is that there are core bits of functionality (such as the multi-stream support) that are too intrinsic to separate into loadable modules, and that, to be done properly (and with a minimum of maintenance commitment) would also depend on other libraries (that is, doing asynchronous I/O wouldn't technically require the use of other libraries, but it can be a lot of work to do efficiently and portably across OSses, and there are already Free libraries to do that for us). - And perhaps that is the problem. In order to re-use existing parts of code, rather than adopted them to a "load-if-necessary" type structure -- everyone prefers to just use them "as is", thus one lib references another, and another...and so on. Like I think you pull in "cat", and you get all of the gnu-language libs and tools, which pulls in alternate character set support, which requires certain font rendering packages -- and of course, if you are display alternate characters, lets not forget the corresponding foreign input methods, and the asian-char specific terminal emulators...etc. Can I jump off a cliff yet?...ARG! I hack around such problems, at times, by extracting the 1 run-time library I need, and not the rest of the package, but then my rpm-verify checks turn up supposed "errors" because I'm missing package dependencies. Sigh... If one wanted to add multi-stream support, couldn't the "small wget" have a check to see if the multi-stream support lib was present (or not), and if so, set max-streams equal to one that might yield the basic behavior one might want for the small wget? Not pushing a particular solution -- I, like you, am just throwing out ideas to consider...if they've already covered the points I've raised, feel free to just ignore my ramblings and "carry on"...:-) Linda
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Micah Cowan wrote: > Tony Lewis wrote: >> Perhaps both versions can include multi-threaded support in their >> core version, but the lite version would never invoke >> multi-threading. > > I mentioned this in the first post as well. The main problem I offered > for this was that async I/O tends to make for much more I should point out, too, that I'm talking about asynchronous I/O support, and not multithreaded support, as I'm not really keen on introducing threads to Wget. Especially since, AFAICT, threads sort of suck on Linux, which happens to be the kernel I actively use. This may be somewhat unfortunate, as multithreading code tends not to introduce the code complexity that async I/O does (though IMO it introduces complexities of a different sort). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHK3uA7M8hyUobTrERCM38AJ9BhohEVNuRl2P1rnsjWO/gEgFxCACgjIf3 9hyCb8WHZIFQLZ1UCCaqK5A= =siMR -END PGP SIGNATURE-
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Lewis wrote: > Micah Cowan wrote: > >> Keeping a single Wget and using runtime libraries (which we were >> terming "plugins") was actually the original concept (there's >> mention of this in the first post of this thread, actually); the >> issue is that there are core bits of functionality (such as the >> multi-stream support) that are too intrinsic to separate into >> loadable modules, and that, to be done properly (and with a minimum >> of maintenance commitment) would also depend on other libraries >> (that is, doing asynchronous I/O wouldn't technically require the >> use of other libraries, but it can be a lot of work to do >> efficiently and portably across OSses, and there are already Free >> libraries to do that for us). > > Perhaps both versions can include multi-threaded support in their > core version, but the lite version would never invoke > multi-threading. I mentioned this in the first post as well. The main problem I offered for this was that async I/O tends to make for much more complicated/hard-to-follow code, which will make the "lite" Wget (even more) difficult to read, without reaping the actual benefits gained from such complications. Of course, whether this is a sufficient justification to maintain two different versions of Wget is another question... There's also the fact that libcurl starts looking _very_ attractive to handle the async I/O web comm stuff, so that ideally we don't actually have to rewrite any of the I/O and HTTP logic, but just replace it wholesale. If we decide to use that for the async stuff, then it seemse to me that having two separate programs suddenly becomes more-or-less a foregone conclusion, as I don't really want to introduce a dependency on libcurl for the "lite" Wget (though Hrvoje's response on the thread that Daniel Stenberg posted suggests I'd have an excuse to do so). Note that in any case, having two separate command-line interfaces is pretty much unavoidable IMO, as the current CLI is fast becoming unwieldly, and certain aspects are fairly confusing, so that I don't really want to use it as the basis on which to build some of the newer configuration features; at the same time, I want to keep the current interface around for the current Wget usage, so I don't break people's scripts, etc. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHK3kf7M8hyUobTrERCCe6AJ93sxZkba5yDcaTF1asibpHZdjkzgCgiH0T 9xed5XQH/CEbZmknLpUtRPo= =L3hf -END PGP SIGNATURE-
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
On 11/2/07, Tony Lewis <[EMAIL PROTECTED]> wrote: > Micah Cowan wrote: > > > Keeping a single Wget and using runtime libraries (which we were terming > > "plugins") was actually the original concept (there's mention of this in > > the first post of this thread, actually); the issue is that there are > > core bits of functionality (such as the multi-stream support) that are > > too intrinsic to separate into loadable modules, and that, to be done > > properly (and with a minimum of maintenance commitment) would also > > depend on other libraries (that is, doing asynchronous I/O wouldn't > > technically require the use of other libraries, but it can be a lot of > > work to do efficiently and portably across OSses, and there are already > > Free libraries to do that for us). > > Perhaps both versions can include multi-threaded support in their core > version, but the lite version would never invoke multi-threading. > > Tony Yes! Some features of current wget are doing their own sorta-multithreading and would be simplified by this approach and their modularity enhanced (e.g. the progress reporting has several options, which could become plug-ins) by this approach. Tony G -- Best Regards. Please keep in touch.
RE: Thoughts on Wget 1.x, 2.0 (*LONG!*)
Micah Cowan wrote: > Keeping a single Wget and using runtime libraries (which we were terming > "plugins") was actually the original concept (there's mention of this in > the first post of this thread, actually); the issue is that there are > core bits of functionality (such as the multi-stream support) that are > too intrinsic to separate into loadable modules, and that, to be done > properly (and with a minimum of maintenance commitment) would also > depend on other libraries (that is, doing asynchronous I/O wouldn't > technically require the use of other libraries, but it can be a lot of > work to do efficiently and portably across OSses, and there are already > Free libraries to do that for us). Perhaps both versions can include multi-threaded support in their core version, but the lite version would never invoke multi-threading. Tony
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 L Walsh wrote: > Honest -- I hadn't read all the threads before my post... > > Great ideas Micah! :-) > > On the idea of 2 wgets -- there is a "clever" way to get > by with 1. Put the "optional" functionality into separate > run-time loadable files. SGI's Unix (and MS Windows) do this. > The "small wget" then checks to see which libraries are > "accessible" -- those that aren't simply mean the features > for those libs are disabled. In a way, it's like how > 'vim' can optionally load perllib or python-lib at runtime > (at least under windows) if they are present. If they are > not present, those features are disabled. Too bad linux > didn't take this route with its libraries (have asked, > it is possible, but there's no "framework" for it, and > that might need work as well. I'm not sure what you mean about the linux thing; there are many instances of runtime loadable modules on Linux. dlopen() and friends are the standard way of doing this on any Unix kernel flavor. Keeping a single Wget and using runtime libraries (which we were terming "plugins") was actually the original concept (there's mention of this in the first post of this thread, actually); the issue is that there are core bits of functionality (such as the multi-stream support) that are too intrinsic to separate into loadable modules, and that, to be done properly (and with a minimum of maintenance commitment) would also depend on other libraries (that is, doing asynchronous I/O wouldn't technically require the use of other libraries, but it can be a lot of work to do efficiently and portably across OSses, and there are already Free libraries to do that for us). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHKo867M8hyUobTrERCBxGAJ44coJN48fRGhORfYv+uN2J6RVz7gCePxva UYeGYTW0sfY+QRcGkpSB9Ls= =wOVv -END PGP SIGNATURE-
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
Honest -- I hadn't read all the threads before my post... Great ideas Micah! :-) On the idea of 2 wgets -- there is a "clever" way to get by with 1. Put the "optional" functionality into separate run-time loadable files. SGI's Unix (and MS Windows) do this. The "small wget" then checks to see which libraries are "accessible" -- those that aren't simply mean the features for those libs are disabled. In a way, it's like how 'vim' can optionally load perllib or python-lib at runtime (at least under windows) if they are present. If they are not present, those features are disabled. Too bad linux didn't take this route with its libraries (have asked, it is possible, but there's no "framework" for it, and that might need work as well. My 2 cents, Linda
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
On 10/31/07, Micah Cowan <[EMAIL PROTECTED]> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > Tony Godshall wrote: > > On 10/30/07, Micah Cowan <[EMAIL PROTECTED]> wrote: > >> -BEGIN PGP SIGNED MESSAGE- > >> Hash: SHA256 > >> > >> Tony Godshall wrote: > >>> Perhaps the little wget could be called "wg". A quick google and > >>> wikipedia search shows no real namespace collisions. > >> To reduce confusion/upgrade problems, I would think we would want to > >> ensure that the "traditional"/little Wget keeps the current name, and > >> any snazzified version gets a new one. > > > > Please not another -ng. How about wget2 (since we're on 1.x). And > > the current one remains in 1.x. > > I agree that -ng would not be appropriate. But since we're really > talking about two separate beasts, I'd prefer not to limit what we can > do with Wget (original)'s versioning. Who's to say a 2.0 release of the > "light" version will not be warranted someday? > > At any rate, the "snazzy" one looks to be diverging from classic Wget in > some rather significant ways, in which case, I'd kind of prefer to part > names a bit more severely than just "wget-ng" or "wget2". "Reget", > perhaps: that name could be both "Recursive Get" (describing what's > still its primary feature), or "Revised/Re-envisioned Wget". :) > > I think, too, that names such as "wget2" are more often things that > packagers (say, Debian) do, when they want to include > backwards-incompatible, significantly new versions of software, but > don't want to break people's usage of older stuff. Or, when they just > want to offer both versions. Cf "apache2" in Debian. > > > And then eventually everyone's gotten used to used to and can't live > > without the new bittorrent-like almost-multithreaded features. ;-) > > :) Pget. Parallel get. Tget. Torrent-like-get. Bget. Bigger get. BBWget. Bigger Better wget. OK, ok sorry.
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: > On 10/30/07, Micah Cowan <[EMAIL PROTECTED]> wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA256 >> >> Tony Godshall wrote: >>> Perhaps the little wget could be called "wg". A quick google and >>> wikipedia search shows no real namespace collisions. >> To reduce confusion/upgrade problems, I would think we would want to >> ensure that the "traditional"/little Wget keeps the current name, and >> any snazzified version gets a new one. > > Please not another -ng. How about wget2 (since we're on 1.x). And > the current one remains in 1.x. I agree that -ng would not be appropriate. But since we're really talking about two separate beasts, I'd prefer not to limit what we can do with Wget (original)'s versioning. Who's to say a 2.0 release of the "light" version will not be warranted someday? At any rate, the "snazzy" one looks to be diverging from classic Wget in some rather significant ways, in which case, I'd kind of prefer to part names a bit more severely than just "wget-ng" or "wget2". "Reget", perhaps: that name could be both "Recursive Get" (describing what's still its primary feature), or "Revised/Re-envisioned Wget". :) I think, too, that names such as "wget2" are more often things that packagers (say, Debian) do, when they want to include backwards-incompatible, significantly new versions of software, but don't want to break people's usage of older stuff. Or, when they just want to offer both versions. Cf "apache2" in Debian. > And then eventually everyone's gotten used to used to and can't live > without the new bittorrent-like almost-multithreaded features. ;-) :) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHKQ7o7M8hyUobTrERCIE+AKCHz5e2y8aSjWu7r9B+JTzB+fZEmwCfboCx wg3829rXUh1Bj+WqzPCKt9I= =/UKq -END PGP SIGNATURE-
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
On 10/30/07, Micah Cowan <[EMAIL PROTECTED]> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > Tony Godshall wrote: > > Perhaps the little wget could be called "wg". A quick google and > > wikipedia search shows no real namespace collisions. > > To reduce confusion/upgrade problems, I would think we would want to > ensure that the "traditional"/little Wget keeps the current name, and > any snazzified version gets a new one. Please not another -ng. How about wget2 (since we're on 1.x). And the current one remains in 1.x. And then eventually everyone's gotten used to used to and can't live without the new bittorrent-like almost-multithreaded features. ;-) Tony
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Josh Williams wrote: > Although the code might > suck for those trying to read it, I think it could be very great with > a little regular maintenance. Oh, I think it's probably already earned a reputation for greatness at this point. But yeah, it needs some maintenance work. Which is, of course, what I volunteered for in the first place :) > There still remains the question, though, of whether version 2 will > require a complete rewrite. Considering how fundamental these changes > are, I don't think we would have much of a choice. Right. The idea I... thought I had settled on, was to refactor what we have, until it is sufficiently pliable to start adding some of the version 2 features. If, OTOH, we're going to have two separate projects, there's less motivation to try to slowly rework everything under the sun; though there are obviously still sections that would benefit from refactoring (gethttp and http_loop are currently still right in my crosshairs). > You mentioned that > they could share code for recursion, but I don't see how. IIRC, the > code for recursion in the current version is very dependent on the > current methods of operation. It would probably have to be rewritten > to be shared. Yeah, the shared codebase would probably be pretty small. But the actual logic about how to parse HTML, or whether or not to descend, or comparing Web timestamps to local ones, should be sharable. But yes, after a rewrite of the relevant code. I don't think we'd have to "make" it happen, in particular; as we discover common logic that can be factored, we'll just... do it. > As for libcurl, I see no reason why not. Also, would these be two > separate GNU projects? Would they be packaged in the same source code, > like finch and pidgin? Probably not packaged together. People who want the traditional Wget are not gonna want to download the JavaScript and MetaLink support code. :\ We should keep it as tight as possible. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHKARG7M8hyUobTrERCGHUAJ9a8KP5QV05mZqy1PHhNU0WEjkp7wCbBiG1 qohy2y3OjJZnPT1ErfkkVHw= =XXre -END PGP SIGNATURE-
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Daniel Stenberg wrote: > I guess I'm not the man to ask nor comment this a lot, but look what I > found: > > http://www.mail-archive.com/[email protected]/msg01129.html > > I've always thought and I still believe that wget's power and most > appreciated abilities are in the features it adds on top of the > transfer, like HTML parsing, ftp list parsing and the other things you > mentioned. Of course, in this case, we'd be talking more about linking with libcurl for Wget2, rather than incorporating it, so we wouldn't have to worry about copyright disclaimers. Besides which, according to the maintainers document, we only need to get those for files that do not include a license statement. > Of course, going one single unified transfer library is perhaps not the > best thing from a software eco-system perspective, as competition tends > to drive innovation and development, but the more users of a free > software/open source project we get the better it will become. Well, in the first place, ours isn't a library, so for the most part it isn't really usable by other folks. :) And there's still libwww from the W3C, at least (and probably others). Besides, the great thing about the _free_ software eco-system, is that even when there is only a single, unified library, as long as it is free it can easily be forked to move in a new direction to meet differing requirements. :) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHKAQ47M8hyUobTrERCNZgAJ4rsG9ZlZuoHmvZBssE5oPGKY6yOACfRkc0 HEKiQEEbbs9IZWg3AwfyNII= =kiF5 -END PGP SIGNATURE-
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: > Perhaps the little wget could be called "wg". A quick google and > wikipedia search shows no real namespace collisions. To reduce confusion/upgrade problems, I would think we would want to ensure that the "traditional"/little Wget keeps the current name, and any snazzified version gets a new one. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHJ59b7M8hyUobTrERCLs9AJ478M50hIs4hMegAGYhKEXL5tCaAgCdGR+e 5A6mtbAq2iX6Azvcfbd10cI= =SXun -END PGP SIGNATURE-
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
On 10/26/07, Josh Williams <[EMAIL PROTECTED]> wrote: > On 10/26/07, Micah Cowan <[EMAIL PROTECTED]> wrote: > > And, of course, when I say "there would be two Wgets", what I really > > mean by that is that the more exotic-featured one would be something > > else entirely than a Wget, and would have a separate name. > > I think the idea of having two Wgets is good. I too have been > concerned about the resources required in creating the all-out version > 2.0. The current code for Wget is a bit mangled, but I think the basic > concepts surrounding it are very good ones. Although the code might > suck for those trying to read it, I think it could be very great with > a little regular maintenance. Perhaps the little wget could be called "wg". A quick google and wikipedia search shows no real namespace collisions. > There still remains the question, though, of whether version 2 will > require a complete rewrite. Considering how fundamental these changes > are, I don't think we would have much of a choice. You mentioned that > they could share code for recursion, but I don't see how. IIRC, the > code for recursion in the current version is very dependent on the > current methods of operation. It would probably have to be rewritten > to be shared. > > As for libcurl, I see no reason why not. Also, would these be two > separate GNU projects? Would they be packaged in the same source code, > like finch and pidgin? > > I do believe the next question at hand is what version 2's official > mascot will be. I purpose Lenny the tortoise ;) Oooh- confusion with Debian testing >_ .. > Lenny -> (_\/ \_, > 'uuuu~' > -- Best Regards. Please keep in touch.
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
On Fri, 26 Oct 2007, Micah Cowan wrote: The obvious solution to that is to use c-ares, which does exactly that: handle DNS queries asynchronously. Actually, I didn't know this until just now, but c-ares was split off from ares to meet the needs of the curl developers. :) We needed an asynch name resolver for libcurl so c-ares started out that way, but perhaps mostly because the original author didn't care much for our improvements and bug fixes. ADNS is a known alternative, but we couldn't use that due to license restrictions. You (wget) don't have that same problem with it. I'm not able to compare them though, as I never used ADNS... Of course, if we're doing asynchronous net I/O stuff, rather than reinvent the wheel and try to maintain portability for new stuff, we're better off using a prepackaged deal, if one exists. Luckily, one does; a friend of mine (William Ahern) wrote a package called libevnet that handles all of that; When I made libcurl grok a vast number of simultaneous connections, I went straight with libevent for my test and example code. It's solid and fairly easy to use... Perhaps libevnet makes it even easier, I don't know. Plus, there is the following thought. While I've talked about not reinventing the wheel, using existing packages to save us the trouble of having to maintain portable async code, higher-level buffered-IO and network comm code, etc, I've been neglecting one more package choice. There is, after all, already a Free Software package that goes beyond handling asynchronous network operations, to specifically handle asynchronous _web_ operations; I'm speaking, of course, of libcurl. I guess I'm not the man to ask nor comment this a lot, but look what I found: http://www.mail-archive.com/[email protected]/msg01129.html I've always thought and I still believe that wget's power and most appreciated abilities are in the features it adds on top of the transfer, like HTML parsing, ftp list parsing and the other things you mentioned. Of course, going one single unified transfer library is perhaps not the best thing from a software eco-system perspective, as competition tends to drive innovation and development, but the more users of a free software/open source project we get the better it will become.
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
On 10/26/07, Micah Cowan <[EMAIL PROTECTED]> wrote: > And, of course, when I say "there would be two Wgets", what I really > mean by that is that the more exotic-featured one would be something > else entirely than a Wget, and would have a separate name. I think the idea of having two Wgets is good. I too have been concerned about the resources required in creating the all-out version 2.0. The current code for Wget is a bit mangled, but I think the basic concepts surrounding it are very good ones. Although the code might suck for those trying to read it, I think it could be very great with a little regular maintenance. There still remains the question, though, of whether version 2 will require a complete rewrite. Considering how fundamental these changes are, I don't think we would have much of a choice. You mentioned that they could share code for recursion, but I don't see how. IIRC, the code for recursion in the current version is very dependent on the current methods of operation. It would probably have to be rewritten to be shared. As for libcurl, I see no reason why not. Also, would these be two separate GNU projects? Would they be packaged in the same source code, like finch and pidgin? I do believe the next question at hand is what version 2's official mascot will be. I purpose Lenny the tortoise ;) _ .. Lenny -> (_\/ \_, 'uuuu~'
Thoughts on Wget 1.x, 2.0 (*LONG!*)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 With talk of supporting multiple simultaneous connections in a next-generation version of Wget, various things have been tumbling around in my mind. First off is that I would not wish to do such a thing with threads. Threads introduce too many problems of their own, including portability and debugability. I'd much prefer to do asynchronous I/O. With the use of asynchronous I/O, a (possibly) better way to do - --timeout presents itself: we can do the appropriate timeouts in our calls to select(). The main advantage to this is that we don't have to muck around with signals, signal handling, various portability issues, etc. We can do one --timeout and be done. The primary downside to this is that potentially blocking, not directly I/O things don't get timed out anymore. The only thing that currently comes to mind is gethostbyname(), which obviously can block, but can't be select()ed or set to some sort of non-blocking mode. Also, even aside from --timeout, having all other traffic sit around and wait until a name is resolved is not really desirable. The obvious solution to that is to use c-ares, which does exactly that: handle DNS queries asynchronously. Actually, I didn't know this until just now, but c-ares was split off from ares to meet the needs of the curl developers. :) Of course, if we're doing asynchronous net I/O stuff, rather than reinvent the wheel and try to maintain portability for new stuff, we're better off using a prepackaged deal, if one exists. Luckily, one does; a friend of mine (William Ahern) wrote a package called libevnet that handles all of that; it wraps libevent (by Niels Provos, for handling async I/O very portably and using the best available interfaces on the given system) with higher-level socket and buffer I/O facilities and, and provides a wrapper around c-ares that makes it convenient to use with liblookup. If we're going to do async I/O, using libevent and c-ares, or something very like them, is far too convenient not to do, and after that decision is made, libevnet becomes a clear win too. So, the obvious win is that using libevnet, libevent and c-ares gives us a "shortest path" to using async I/O, having multiple simultaneous connections and async DNS queries, and a potentially better way to manage timeouts. The obvious loss, and one which I'm positive many of you are already screaming at me about, is that we just added 3 library dependencies to Wget in one go. Not freaking cool. Not freaking cool AT ALL. - -= Wget's Strongest Points =- I absolutely do not want to require a bunch of libraries in order for people to build Wget. AFAICT, the vast majority of Wget's user base, which is probably system packagers and distributors, use it for just the following reasons: 1. It's pretty small. Only dependency is OpenSSL, which isn't even required, but of course in general nobody really doesn't want SSL. (Ooh looky! Double negatives!) 2. It's robust. Connection dropped? No prob, try again. 3. It avoids mucking with preexisting files. Downloading a file named "foo", but you already _have_ a "foo"? No prob, let's call it "foo.1". To my mind, these are the core values that have led to so many different distributions and large software packages relying on Wget. Messing with any one of these is likely to lose Wget "customers", and in our largest "target market". (DISCLAIMER: naturally I have nothing whatsoever to back these claims up. It's conjecture. But it seems pretty credible to me.) Another major "market" for Wget is the typical command-line "power user", who uses Wget not only to grab off a quick file, but also to grab whole sections of sites recursively, and perhaps with occasional quirky needs like only-visit-these-domains or only-download-these-file-types. For these people, while point #1 above probably holds relatively little value, probably being replaced primarily by Wget's HTML-crawling functionality. In addition to these, points that I believe are highly desirable to such users are: - Being able to tell Wget precisely which files to download and which to skip. The more expressive power we have to accomplish this the better. Wget already has remarkable flexibility in this area; but there are many more things that are desirable, and some of the existing interface is not up to the task of really powerful expression in this area. - Being able to parse and "recursively descend" CSS is really, really important. - Being able to do multiple connections, potentially accelerating the total download time (mainly for multi-host sessions), would be a win. - Being able to extend Wget, to grok new filetypes for recursive descent (such as non-HTML XML files, or JavaScript), or extend the power of expression of "what to grab" even further. - -= The Two Wgets =- It seems to me, then, that what's really required may in fact be two different "Wgets". One that is lightweight but packs a punch: basically Wget as it a
