Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-11-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

L Walsh wrote:
> Micah Cowan wrote:
>> I'm not sure what you mean about the linux thing; there are many
>> instances of runtime loadable modules on Linux. dlopen() and friends are
>> the standard way of doing this on any Unix kernel flavor.
> 
> I _thought_ so, but when I asked a distro why they didn't
> use this, they said it would require rewriting nearly all currently
> existing applications.
> 
> My specific complain was against a SuSE distro, that in
> in order to load one.rpm, it depended on two.rpm, which depended on
> three.rpm, and that on four.rpm, etc. The functionality in "two.rpm"
> was to load a library to handle "active directories" which, in my
> non-MS, small setup, I didn't need -- and I didn't want to load
> the 5-7 supporting packages for AD, since I didn't use them.
> BUT, because of static-run time loading, one.rpm would fail if two.rpm
> wasn't loaded...and so on and so forth.  AFAIK, the same problem
> exists on nearly every distro -- because no one bothers to think
> that they might not want to load every package on the CD, just
> to support local host lookup using...say nscd.  G.

Ah, well, that's a different situation. In order to decide at runtime
whether to load a runtime library or not, dlopen() is the standard way
to handle that. However, if the application wasn't designed to make the
decision at runtime, but rather at build time, then it does require code
rewriting.

In this case, though, we're specifically talking about loadable modules.
We might choose to allow some of them to be linked at build time, but
we'd definitely have to at least support conditional linking at runtime.

>> Keeping a single Wget and using runtime libraries (which we were terming
>> "plugins") was actually the original concept (there's mention of this in
>> the first post of this thread, actually); 
> ---
> Sounds good to me! :-)
> 
>> the issue is that there are
>> core bits of functionality (such as the multi-stream support) that are
>> too intrinsic to separate into loadable modules, and that, to be done
>> properly (and with a minimum of maintenance commitment) would also
>> depend on other libraries (that is, doing asynchronous I/O wouldn't
>> technically require the use of other libraries, but it can be a lot of
>> work to do efficiently and portably across OSses, and there are already
>> Free libraries to do that for us).
> -
> And perhaps that is the problem.  In order to re-use existing
> parts of code, rather than adopted them to a "load-if-necessary" type
> structure -- everyone prefers to just use them "as is", thus one lib
> references another, and another...and so on.  Like I think you pull
> in "cat", and you get all of the gnu-language libs and tools, which
> pulls in alternate character set support, which requires certain
> font rendering packages -- and of course, if you are display
> alternate characters, lets not forget the corresponding foreign
> input methods, and the asian-char specific terminal emulators...etc.

That's retarded. Native Language Support for a terminal program
shouldn't pull in font-rendering packages: displaying the characters
properly is the terminal's responsibility. I have some trouble believing
that any packagers would actually have such dependencies, but if they
do, it's retarded. A program like "cat" should depend only on the system
library, and (if NLS is supported) gettext (which shouldn't depend on
anything else).

> Can I jump off a cliff yet?...ARG!  I hack around such problems,
> at times, by extracting the 1 run-time library I need, and not the
> rest of the package, but then my rpm-verify checks turn up supposed
> "errors" because I'm missing package dependencies.  Sigh...

Frustrating experiences with RedHat's package management is why I'm now
a Debian/Ubuntu user. :)

> If one wanted to add multi-stream support, couldn't the
> "small wget" have a check to see if the multi-stream support lib
> was present (or not), and if so, set max-streams equal to one that
> might yield the basic behavior one might want for the small wget?

Well, but the actual support for having any sort of multi-stream is a
major rewrite of the entire I/O code. Much better to use a separate
library for that, if we can get it. In that case, it stops being
something we can simply check for and use if it's available, but
something that the code would absolutely require.

> Not pushing a particular solution -- I, like you, am just throwing
> out ideas to consider...if they've already covered the points I've
> raised, feel free to just ignore my ramblings and "carry on"...:-)

Well, and fortunately we've got plenty of time to talk about these
things: my focus right now is on getting 1.11 out the door, after which
there are _plenty_ of things to keep me busy for 1.12 (still a "lite"
release) for quite some time.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://mic

Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-11-02 Thread L Walsh



Micah Cowan wrote:

I'm not sure what you mean about the linux thing; there are many
instances of runtime loadable modules on Linux. dlopen() and friends are
the standard way of doing this on any Unix kernel flavor.


I _thought_ so, but when I asked a distro why they didn't
use this, they said it would require rewriting nearly all currently
existing applications.

My specific complain was against a SuSE distro, that in
in order to load one.rpm, it depended on two.rpm, which depended on
three.rpm, and that on four.rpm, etc. The functionality in "two.rpm"
was to load a library to handle "active directories" which, in my
non-MS, small setup, I didn't need -- and I didn't want to load
the 5-7 supporting packages for AD, since I didn't use them.
BUT, because of static-run time loading, one.rpm would fail if two.rpm
wasn't loaded...and so on and so forth.  AFAIK, the same problem
exists on nearly every distro -- because no one bothers to think
that they might not want to load every package on the CD, just
to support local host lookup using...say nscd.  G.



Keeping a single Wget and using runtime libraries (which we were terming
"plugins") was actually the original concept (there's mention of this in
the first post of this thread, actually); 

---
Sounds good to me! :-)


the issue is that there are
core bits of functionality (such as the multi-stream support) that are
too intrinsic to separate into loadable modules, and that, to be done
properly (and with a minimum of maintenance commitment) would also
depend on other libraries (that is, doing asynchronous I/O wouldn't
technically require the use of other libraries, but it can be a lot of
work to do efficiently and portably across OSses, and there are already
Free libraries to do that for us).

-
And perhaps that is the problem.  In order to re-use existing
parts of code, rather than adopted them to a "load-if-necessary" type
structure -- everyone prefers to just use them "as is", thus one lib
references another, and another...and so on.  Like I think you pull
in "cat", and you get all of the gnu-language libs and tools, which
pulls in alternate character set support, which requires certain
font rendering packages -- and of course, if you are display
alternate characters, lets not forget the corresponding foreign
input methods, and the asian-char specific terminal emulators...etc.

Can I jump off a cliff yet?...ARG!  I hack around such problems,
at times, by extracting the 1 run-time library I need, and not the
rest of the package, but then my rpm-verify checks turn up supposed
"errors" because I'm missing package dependencies.  Sigh...

If one wanted to add multi-stream support, couldn't the
"small wget" have a check to see if the multi-stream support lib
was present (or not), and if so, set max-streams equal to one that
might yield the basic behavior one might want for the small wget?

Not pushing a particular solution -- I, like you, am just throwing
out ideas to consider...if they've already covered the points I've
raised, feel free to just ignore my ramblings and "carry on"...:-)
Linda


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-11-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Micah Cowan wrote:
> Tony Lewis wrote:
>> Perhaps both versions can include multi-threaded support in their
>> core version, but the lite version would never invoke
>> multi-threading.
> 
> I mentioned this in the first post as well. The main problem I offered
> for this was that async I/O tends to make for much more

I should point out, too, that I'm talking about asynchronous I/O
support, and not multithreaded support, as I'm not really keen on
introducing threads to Wget. Especially since, AFAICT, threads sort of
suck on Linux, which happens to be the kernel I actively use. This may
be somewhat unfortunate, as multithreading code tends not to introduce
the code complexity that async I/O does (though IMO it introduces
complexities of a different sort).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHK3uA7M8hyUobTrERCM38AJ9BhohEVNuRl2P1rnsjWO/gEgFxCACgjIf3
9hyCb8WHZIFQLZ1UCCaqK5A=
=siMR
-END PGP SIGNATURE-


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-11-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Tony Lewis wrote:
> Micah Cowan wrote:
> 
>> Keeping a single Wget and using runtime libraries (which we were
>> terming "plugins") was actually the original concept (there's
>> mention of this in the first post of this thread, actually); the
>> issue is that there are core bits of functionality (such as the
>> multi-stream support) that are too intrinsic to separate into
>> loadable modules, and that, to be done properly (and with a minimum
>> of maintenance commitment) would also depend on other libraries
>> (that is, doing asynchronous I/O wouldn't technically require the
>> use of other libraries, but it can be a lot of work to do
>> efficiently and portably across OSses, and there are already Free
>> libraries to do that for us).
> 
> Perhaps both versions can include multi-threaded support in their
> core version, but the lite version would never invoke
> multi-threading.

I mentioned this in the first post as well. The main problem I offered
for this was that async I/O tends to make for much more
complicated/hard-to-follow code, which will make the "lite" Wget (even
more) difficult to read, without reaping the actual benefits gained from
such complications. Of course, whether this is a sufficient
justification to maintain two different versions of Wget is another
question...

There's also the fact that libcurl starts looking _very_ attractive to
handle the async I/O web comm stuff, so that ideally we don't actually
have to rewrite any of the I/O and HTTP logic, but just replace it
wholesale. If we decide to use that for the async stuff, then it seemse
to me that having two separate programs suddenly becomes more-or-less a
foregone conclusion, as I don't really want to introduce a dependency on
libcurl for the "lite" Wget (though Hrvoje's response on the thread that
Daniel Stenberg posted suggests I'd have an excuse to do so).

Note that in any case, having two separate command-line interfaces is
pretty much unavoidable IMO, as the current CLI is fast becoming
unwieldly, and certain aspects are fairly confusing, so that I don't
really want to use it as the basis on which to build some of the newer
configuration features; at the same time, I want to keep the current
interface around for the current Wget usage, so I don't break people's
scripts, etc.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHK3kf7M8hyUobTrERCCe6AJ93sxZkba5yDcaTF1asibpHZdjkzgCgiH0T
9xed5XQH/CEbZmknLpUtRPo=
=L3hf
-END PGP SIGNATURE-


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-11-02 Thread Tony Godshall
On 11/2/07, Tony Lewis <[EMAIL PROTECTED]> wrote:
> Micah Cowan wrote:
>
> > Keeping a single Wget and using runtime libraries (which we were terming
> > "plugins") was actually the original concept (there's mention of this in
> > the first post of this thread, actually); the issue is that there are
> > core bits of functionality (such as the multi-stream support) that are
> > too intrinsic to separate into loadable modules, and that, to be done
> > properly (and with a minimum of maintenance commitment) would also
> > depend on other libraries (that is, doing asynchronous I/O wouldn't
> > technically require the use of other libraries, but it can be a lot of
> > work to do efficiently and portably across OSses, and there are already
> > Free libraries to do that for us).
>
> Perhaps both versions can include multi-threaded support in their core 
> version, but the lite version would never invoke multi-threading.
>
> Tony

Yes!

Some features of current wget are doing their own sorta-multithreading
and would be simplified by this approach and their modularity enhanced
(e.g. the progress reporting has several options, which could become plug-ins)
by this approach.

Tony G



-- 
Best Regards.
Please keep in touch.


RE: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-11-02 Thread Tony Lewis
Micah Cowan wrote:

> Keeping a single Wget and using runtime libraries (which we were terming
> "plugins") was actually the original concept (there's mention of this in
> the first post of this thread, actually); the issue is that there are
> core bits of functionality (such as the multi-stream support) that are
> too intrinsic to separate into loadable modules, and that, to be done
> properly (and with a minimum of maintenance commitment) would also
> depend on other libraries (that is, doing asynchronous I/O wouldn't
> technically require the use of other libraries, but it can be a lot of
> work to do efficiently and portably across OSses, and there are already
> Free libraries to do that for us).

Perhaps both versions can include multi-threaded support in their core version, 
but the lite version would never invoke multi-threading.

Tony



Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-11-01 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

L Walsh wrote:
> Honest -- I hadn't read all the threads before my post...
> 
> Great ideas Micah! :-)
> 
> On the idea of 2 wgets -- there is a "clever" way to get
> by with 1.  Put the "optional" functionality into separate
> run-time loadable files.  SGI's Unix (and MS Windows) do this.
> The "small wget" then checks to see which libraries are
> "accessible" -- those that aren't simply mean the features
> for those libs are disabled.  In a way, it's like how
> 'vim' can optionally load perllib or python-lib at runtime
> (at least under windows) if they are present.  If they are
> not present, those features are disabled.  Too bad linux
> didn't take this route with its libraries (have asked,
> it is possible, but there's no "framework" for it, and
> that might need work as well.

I'm not sure what you mean about the linux thing; there are many
instances of runtime loadable modules on Linux. dlopen() and friends are
the standard way of doing this on any Unix kernel flavor.

Keeping a single Wget and using runtime libraries (which we were terming
"plugins") was actually the original concept (there's mention of this in
the first post of this thread, actually); the issue is that there are
core bits of functionality (such as the multi-stream support) that are
too intrinsic to separate into loadable modules, and that, to be done
properly (and with a minimum of maintenance commitment) would also
depend on other libraries (that is, doing asynchronous I/O wouldn't
technically require the use of other libraries, but it can be a lot of
work to do efficiently and portably across OSses, and there are already
Free libraries to do that for us).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHKo867M8hyUobTrERCBxGAJ44coJN48fRGhORfYv+uN2J6RVz7gCePxva
UYeGYTW0sfY+QRcGkpSB9Ls=
=wOVv
-END PGP SIGNATURE-


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-11-01 Thread L Walsh

Honest -- I hadn't read all the threads before my post...

Great ideas Micah! :-)

On the idea of 2 wgets -- there is a "clever" way to get
by with 1.  Put the "optional" functionality into separate
run-time loadable files.  SGI's Unix (and MS Windows) do this.
The "small wget" then checks to see which libraries are
"accessible" -- those that aren't simply mean the features
for those libs are disabled.  In a way, it's like how
'vim' can optionally load perllib or python-lib at runtime
(at least under windows) if they are present.  If they are
not present, those features are disabled.  Too bad linux
didn't take this route with its libraries (have asked,
it is possible, but there's no "framework" for it, and
that might need work as well.

My 2 cents,
Linda



Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-11-01 Thread Tony Godshall
On 10/31/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Tony Godshall wrote:
> > On 10/30/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
> >> -BEGIN PGP SIGNED MESSAGE-
> >> Hash: SHA256
> >>
> >> Tony Godshall wrote:
> >>> Perhaps the little wget could be called "wg".  A quick google and
> >>> wikipedia search shows no real namespace collisions.
> >> To reduce confusion/upgrade problems, I would think we would want to
> >> ensure that the "traditional"/little Wget keeps the current name, and
> >> any snazzified version gets a new one.
> >
> > Please not another -ng.  How about wget2 (since we're on 1.x).  And
> > the current one remains in 1.x.
>
> I agree that -ng would not be appropriate. But since we're really
> talking about two separate beasts, I'd prefer not to limit what we can
> do with Wget (original)'s versioning. Who's to say a 2.0 release of the
> "light" version will not be warranted someday?
>
> At any rate, the "snazzy" one looks to be diverging from classic Wget in
> some rather significant ways, in which case, I'd kind of prefer to part
> names a bit more severely than just "wget-ng" or "wget2". "Reget",
> perhaps: that name could be both "Recursive Get" (describing what's
> still its primary feature), or "Revised/Re-envisioned Wget". :)
>
> I think, too, that names such as "wget2" are more often things that
> packagers (say, Debian) do, when they want to include
> backwards-incompatible, significantly new versions of software, but
> don't want to break people's usage of older stuff. Or, when they just
> want to offer both versions. Cf "apache2" in Debian.
>
> > And then eventually everyone's gotten used to used to and can't live
> > without the new bittorrent-like almost-multithreaded features. ;-)
>
> :)

Pget.

Parallel get.

Tget.

Torrent-like-get.

Bget.

Bigger get.

BBWget.

Bigger Better wget.

OK, ok sorry.


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-10-31 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Tony Godshall wrote:
> On 10/30/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> Tony Godshall wrote:
>>> Perhaps the little wget could be called "wg".  A quick google and
>>> wikipedia search shows no real namespace collisions.
>> To reduce confusion/upgrade problems, I would think we would want to
>> ensure that the "traditional"/little Wget keeps the current name, and
>> any snazzified version gets a new one.
> 
> Please not another -ng.  How about wget2 (since we're on 1.x).  And
> the current one remains in 1.x.

I agree that -ng would not be appropriate. But since we're really
talking about two separate beasts, I'd prefer not to limit what we can
do with Wget (original)'s versioning. Who's to say a 2.0 release of the
"light" version will not be warranted someday?

At any rate, the "snazzy" one looks to be diverging from classic Wget in
some rather significant ways, in which case, I'd kind of prefer to part
names a bit more severely than just "wget-ng" or "wget2". "Reget",
perhaps: that name could be both "Recursive Get" (describing what's
still its primary feature), or "Revised/Re-envisioned Wget". :)

I think, too, that names such as "wget2" are more often things that
packagers (say, Debian) do, when they want to include
backwards-incompatible, significantly new versions of software, but
don't want to break people's usage of older stuff. Or, when they just
want to offer both versions. Cf "apache2" in Debian.

> And then eventually everyone's gotten used to used to and can't live
> without the new bittorrent-like almost-multithreaded features. ;-)

:)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHKQ7o7M8hyUobTrERCIE+AKCHz5e2y8aSjWu7r9B+JTzB+fZEmwCfboCx
wg3829rXUh1Bj+WqzPCKt9I=
=/UKq
-END PGP SIGNATURE-


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-10-31 Thread Tony Godshall
On 10/30/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Tony Godshall wrote:
> > Perhaps the little wget could be called "wg".  A quick google and
> > wikipedia search shows no real namespace collisions.
>
> To reduce confusion/upgrade problems, I would think we would want to
> ensure that the "traditional"/little Wget keeps the current name, and
> any snazzified version gets a new one.

Please not another -ng.  How about wget2 (since we're on 1.x).  And
the current one remains in 1.x.

And then eventually everyone's gotten used to used to and can't live
without the new bittorrent-like almost-multithreaded features. ;-)

Tony


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-10-30 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Josh Williams wrote:
> Although the code might
> suck for those trying to read it, I think it could be very great with
> a little regular maintenance.

Oh, I think it's probably already earned a reputation for greatness at
this point. But yeah, it needs some maintenance work. Which is, of
course, what I volunteered for in the first place :)

> There still remains the question, though, of whether version 2 will
> require a complete rewrite. Considering how fundamental these changes
> are, I don't think we would have much of a choice.

Right. The idea I... thought I had settled on, was to refactor what we
have, until it is sufficiently pliable to start adding some of the
version 2 features. If, OTOH, we're going to have two separate projects,
there's less motivation to try to slowly rework everything under the
sun; though there are obviously still sections that would benefit from
refactoring (gethttp and http_loop are currently still right in my
crosshairs).

> You mentioned that
> they could share code for recursion, but I don't see how. IIRC, the
> code for recursion in the current version is very dependent on the
> current methods of operation. It would probably have to be rewritten
> to be shared.

Yeah, the shared codebase would probably be pretty small. But the actual
logic about how to parse HTML, or whether or not to descend, or
comparing Web timestamps to local ones, should be sharable. But yes,
after a rewrite of the relevant code.

I don't think we'd have to "make" it happen, in particular; as we
discover common logic that can be factored, we'll just... do it.

> As for libcurl, I see no reason why not. Also, would these be two
> separate GNU projects? Would they be packaged in the same source code,
> like finch and pidgin?

Probably not packaged together. People who want the traditional Wget are
not gonna want to download the JavaScript and MetaLink support code. :\
We should keep it as tight as possible.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHKARG7M8hyUobTrERCGHUAJ9a8KP5QV05mZqy1PHhNU0WEjkp7wCbBiG1
qohy2y3OjJZnPT1ErfkkVHw=
=XXre
-END PGP SIGNATURE-


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-10-30 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Daniel Stenberg wrote:
> I guess I'm not the man to ask nor comment this a lot, but look what I
> found:
> 
>   http://www.mail-archive.com/[email protected]/msg01129.html
> 
> I've always thought and I still believe that wget's power and most
> appreciated abilities are in the features it adds on top of the
> transfer, like HTML parsing, ftp list parsing and the other things you
> mentioned.

Of course, in this case, we'd be talking more about linking with libcurl
for Wget2, rather than incorporating it, so we wouldn't have to worry
about copyright disclaimers. Besides which, according to the maintainers
document, we only need to get those for files that do not include a
license statement.

> Of course, going one single unified transfer library is perhaps not the
> best thing from a software eco-system perspective, as competition tends
> to drive innovation and development, but the more users of a free
> software/open source project we get the better it will become.

Well, in the first place, ours isn't a library, so for the most part it
isn't really usable by other folks. :)

And there's still libwww from the W3C, at least (and probably others).

Besides, the great thing about the _free_ software eco-system, is that
even when there is only a single, unified library, as long as it is free
it can easily be forked to move in a new direction to meet differing
requirements. :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHKAQ47M8hyUobTrERCNZgAJ4rsG9ZlZuoHmvZBssE5oPGKY6yOACfRkc0
HEKiQEEbbs9IZWg3AwfyNII=
=kiF5
-END PGP SIGNATURE-


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-10-30 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Tony Godshall wrote:
> Perhaps the little wget could be called "wg".  A quick google and
> wikipedia search shows no real namespace collisions.

To reduce confusion/upgrade problems, I would think we would want to
ensure that the "traditional"/little Wget keeps the current name, and
any snazzified version gets a new one.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHJ59b7M8hyUobTrERCLs9AJ478M50hIs4hMegAGYhKEXL5tCaAgCdGR+e
5A6mtbAq2iX6Azvcfbd10cI=
=SXun
-END PGP SIGNATURE-


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-10-30 Thread Tony Godshall
On 10/26/07, Josh Williams <[EMAIL PROTECTED]> wrote:
> On 10/26/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
> > And, of course, when I say "there would be two Wgets", what I really
> > mean by that is that the more exotic-featured one would be something
> > else entirely than a Wget, and would have a separate name.
>
> I think the idea of having two Wgets is good. I too have been
> concerned about the resources required in creating the all-out version
> 2.0. The current code for Wget is a bit mangled, but I think the basic
> concepts surrounding it are very good ones. Although the code might
> suck for those trying to read it, I think it could be very great with
> a little regular maintenance.

Perhaps the little wget could be called "wg".  A quick google and
wikipedia search shows no real namespace collisions.

> There still remains the question, though, of whether version 2 will
> require a complete rewrite. Considering how fundamental these changes
> are, I don't think we would have much of a choice. You mentioned that
> they could share code for recursion, but I don't see how. IIRC, the
> code for recursion in the current version is very dependent on the
> current methods of operation. It would probably have to be rewritten
> to be shared.
>
> As for libcurl, I see no reason why not. Also, would these be two
> separate GNU projects? Would they be packaged in the same source code,
> like finch and pidgin?
>
> I do believe the next question at hand is what version 2's official
> mascot will be. I purpose Lenny the tortoise ;)

Oooh- confusion with Debian testing

>_  ..
> Lenny ->  (_\/  \_,
> 'uuuu~'
>


-- 
Best Regards.
Please keep in touch.


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-10-27 Thread Daniel Stenberg

On Fri, 26 Oct 2007, Micah Cowan wrote:

The obvious solution to that is to use c-ares, which does exactly that: 
handle DNS queries asynchronously. Actually, I didn't know this until just 
now, but c-ares was split off from ares to meet the needs of the curl 
developers. :)


We needed an asynch name resolver for libcurl so c-ares started out that way, 
but perhaps mostly because the original author didn't care much for our 
improvements and bug fixes. ADNS is a known alternative, but we couldn't use 
that due to license restrictions. You (wget) don't have that same problem with 
it. I'm not able to compare them though, as I never used ADNS...


Of course, if we're doing asynchronous net I/O stuff, rather than reinvent 
the wheel and try to maintain portability for new stuff, we're better off 
using a prepackaged deal, if one exists. Luckily, one does; a friend of mine 
(William Ahern) wrote a package called libevnet that handles all of that;


When I made libcurl grok a vast number of simultaneous connections, I went 
straight with libevent for my test and example code. It's solid and fairly 
easy to use... Perhaps libevnet makes it even easier, I don't know.


Plus, there is the following thought. While I've talked about not 
reinventing the wheel, using existing packages to save us the trouble of 
having to maintain portable async code, higher-level buffered-IO and network 
comm code, etc, I've been neglecting one more package choice. There is, 
after all, already a Free Software package that goes beyond handling 
asynchronous network operations, to specifically handle asynchronous _web_ 
operations; I'm speaking, of course, of libcurl.


I guess I'm not the man to ask nor comment this a lot, but look what I found:

  http://www.mail-archive.com/[email protected]/msg01129.html

I've always thought and I still believe that wget's power and most appreciated 
abilities are in the features it adds on top of the transfer, like HTML 
parsing, ftp list parsing and the other things you mentioned.


Of course, going one single unified transfer library is perhaps not the best 
thing from a software eco-system perspective, as competition tends to drive 
innovation and development, but the more users of a free software/open source 
project we get the better it will become.


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-10-26 Thread Josh Williams
On 10/26/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
> And, of course, when I say "there would be two Wgets", what I really
> mean by that is that the more exotic-featured one would be something
> else entirely than a Wget, and would have a separate name.

I think the idea of having two Wgets is good. I too have been
concerned about the resources required in creating the all-out version
2.0. The current code for Wget is a bit mangled, but I think the basic
concepts surrounding it are very good ones. Although the code might
suck for those trying to read it, I think it could be very great with
a little regular maintenance.

There still remains the question, though, of whether version 2 will
require a complete rewrite. Considering how fundamental these changes
are, I don't think we would have much of a choice. You mentioned that
they could share code for recursion, but I don't see how. IIRC, the
code for recursion in the current version is very dependent on the
current methods of operation. It would probably have to be rewritten
to be shared.

As for libcurl, I see no reason why not. Also, would these be two
separate GNU projects? Would they be packaged in the same source code,
like finch and pidgin?

I do believe the next question at hand is what version 2's official
mascot will be. I purpose Lenny the tortoise ;)

   _  ..
Lenny ->  (_\/  \_,
'uuuu~'


Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-10-26 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

With talk of supporting multiple simultaneous connections in a
next-generation version of Wget, various things have been tumbling
around in my mind.

First off is that I would not wish to do such a thing with threads.
Threads introduce too many problems of their own, including portability
and debugability. I'd much prefer to do asynchronous I/O.

With the use of asynchronous I/O, a (possibly) better way to do
- --timeout presents itself: we can do the appropriate timeouts in our
calls to select(). The main advantage to this is that we don't have to
muck around with signals, signal handling, various portability issues,
etc. We can do one --timeout and be done.

The primary downside to this is that potentially blocking, not directly
I/O things don't get timed out anymore. The only thing that currently
comes to mind is gethostbyname(), which obviously can block, but can't
be select()ed or set to some sort of non-blocking mode. Also, even aside
from --timeout, having all other traffic sit around and wait until a
name is resolved is not really desirable.

The obvious solution to that is to use c-ares, which does exactly that:
handle DNS queries asynchronously. Actually, I didn't know this until
just now, but c-ares was split off from ares to meet the needs of the
curl developers. :)

Of course, if we're doing asynchronous net I/O stuff, rather than
reinvent the wheel and try to maintain portability for new stuff, we're
better off using a prepackaged deal, if one exists. Luckily, one does; a
friend of mine (William Ahern) wrote a package called libevnet that
handles all of that; it wraps libevent (by Niels Provos, for handling
async I/O very portably and using the best available interfaces on the
given system) with higher-level socket and buffer I/O facilities and,
and provides a wrapper around c-ares that makes it convenient to use
with liblookup. If we're going to do async I/O, using libevent and
c-ares, or something very like them, is far too convenient not to do,
and after that decision is made, libevnet becomes a clear win too.

So, the obvious win is that using libevnet, libevent and c-ares gives us
 a "shortest path" to using async I/O, having multiple simultaneous
connections and async DNS queries, and a potentially better way to
manage timeouts.

The obvious loss, and one which I'm positive many of you are already
screaming at me about, is that we just added 3 library dependencies to
Wget in one go. Not freaking cool. Not freaking cool AT ALL.

- -= Wget's Strongest Points =-

I absolutely do not want to require a bunch of libraries in order for
people to build Wget. AFAICT, the vast majority of Wget's user base,
which is probably system packagers and distributors, use it for just the
following reasons:

  1. It's pretty small. Only dependency is OpenSSL, which isn't even
required, but of course in general nobody really doesn't want SSL. (Ooh
looky! Double negatives!)
  2. It's robust. Connection dropped? No prob, try again.
  3. It avoids mucking with preexisting files. Downloading a file named
"foo", but you already _have_ a "foo"? No prob, let's call it "foo.1".

To my mind, these are the core values that have led to so many different
distributions and large software packages relying on Wget. Messing with
any one of these is likely to lose Wget "customers", and in our largest
"target market". (DISCLAIMER: naturally I have nothing whatsoever to
back these claims up. It's conjecture. But it seems pretty credible to me.)

Another major "market" for Wget is the typical command-line "power
user", who uses Wget not only to grab off a quick file, but also to grab
whole sections of sites recursively, and perhaps with occasional quirky
needs like only-visit-these-domains or only-download-these-file-types.
For these people, while point #1 above probably holds relatively little
value, probably being replaced primarily by Wget's HTML-crawling
functionality. In addition to these, points that I believe are highly
desirable to such users are:

  - Being able to tell Wget precisely which files to download and which
to skip. The more expressive power we have to accomplish this the
better. Wget already has remarkable flexibility in this area; but there
are many more things that are desirable, and some of the existing
interface is not up to the task of really powerful expression in this area.
  - Being able to parse and "recursively descend" CSS is really, really
important.
  - Being able to do multiple connections, potentially accelerating the
total download time (mainly for multi-host sessions), would be a win.
  - Being able to extend Wget, to grok new filetypes for recursive
descent (such as non-HTML XML files, or JavaScript), or extend the power
of expression of "what to grab" even further.

- -= The Two Wgets =-

It seems to me, then, that what's really required may in fact be two
different "Wgets".

One that is lightweight but packs a punch: basically Wget as it a