Re: [Bug-wget] Big refactoring

Micah Cowan Tue, 07 Aug 2012 15:53:25 -0700

On 08/07/2012 02:42 PM, Fernando Cassia wrote:
> On Tue, Aug 7, 2012 at 3:08 PM, Micah Cowan <[email protected]> wrote:
>> I think the maintainer is aware that Wget's code quality is poor, and
>> would welcome sweeping architectural changes; I know I would have, when
>> I was maintainer.
> 
> Just an idea... why not "fork" it, call it "wget-NG" (Next Generation
> ;), and develop it in parallel. When/if the "brand new, nifty, easier
> to maintain, completelly cool design" next-generation turns out to be
> as stable and a drop-in replacement for the older -and judged as such
> by the community- then the community itself will switch to ´wget-ng´
> (or ´wgetr2´), and at that point the old code base can stop being
> maintained...


That's actually what I'm basically doing right now, though I've had
scant time for it just recently. http://niwt.addictivecode.org/. Many of
the ideas I'm including (or will include) in Niwt were originally
specifically framed as ideas for a "Wget 2.0" or what have you.

However, it takes ideas in a different enough direction that it's not a
simple "competing project X is better, so use that instead of wget"
decision. Wget is monolithic, portable to non-Unix, written entirely in
C, and can be built to have few dependencies. My "Niwt" project has the
specific aims to be as hackable and behavior-changeable as possible, and
to be modular in the traditional Unix style (a composition of many
smaller utilities, each of which does "one thing well"), at the cost of
such things as resource consumption and efficiency (especially), and
being tied inextricably to Unix. Also, since it's essentially a big
pipeline of many parts, more moving parts generally means more things
that "can go wrong". There are definite trade-offs, and which project
will be better for a user depends very greatly on what their
requirements are.

My next big sweep in Niwt is meant to rewrite the core engine to be
significantly more efficient (Niwt basically constructs a shell pipeline
and then evaluates it; the pipeline will always be shell, or at the very
least shell-like, but the bit that CONSTRUCTS the pipeline doesn't have
to be shell, and currently is). But efficiency will always necessarily
suffer when the fact that several processes copy information from one
process to another (as all pipelines do), and forking can happen for
every HTTP request/response (depending on what options are used).

The plus side is that this sort of extremely modular architecture allows
you to plug in whatever functionality you want. Imagine rendering image
content as it's being downloaded onto disk, so you can preview what's
going on. Or recursing through HTTP links found in PDF files as well as
HTML files. Or transforming JPGs to PNGs on the fly, before saving. The
possibilities are endless.

> And by the way, thanks for the  response Micah. I don´t want to know
> who´s behind every email, as long as the FSF knows who it´s dealing
> with. I wasn´t aware that paperwork was required. Then I guess it´s
> OK.
> 
> I was just concerned since wget is too ubiquitous and becomes easy
> target for nefarious sources to inject vulnerabilities into it...

Well, IMO, no project ought to be so lax in its code review policies
that such a thing should be possible. The best means to avoiding
security vulnerabilities, IMO, is (A) to actually look at the code that
comes in, to see that it's as it should be, and (B) make sure all the
code is readable enough that (A) is possible (and hopefully easy).
illusionoflife's proposal is to help us with B, which would be to some
degree counterproductive if malicious intent was harbored. :)

-mjc

Re: [Bug-wget] Big refactoring

Reply via email to