On Fri, Aug 02, 2013 at 11:53:24AM +0200, Tim Ruehsen wrote: > Hi Dagobert, > > > All this added complexity seems highly overengineered for a feature > > that is not in the core functionality of the tool and that only a > > fraction of the users use. Keep in mind: a good tool is one that does > > a single job right. > > Andrew already answered you mail with a bunch of arguments.
> I am very conform with his writing, your posting puzzled me as well. > > Your above sentence let me want to say: > - The new option is not complex. But it is additional complexity relative to just using sh (which is a common unix practice, to my mind for good reason) and it seems to me, likely to get more so. What's stopping us from adding a new specialized tag for every pet transformation language? We have Perl, will we add Ruby and Python and (...) when people request it? And if not, why not? It's (mild) bloat, that lends itself towards further increased bloat. I view GNU screen's decision to provide language bindings versus tmux's decision to be highly sh-scriptable in a similar vein... lots of extra things to maintain that all do the same thing in different ways (though in their case, the situation is significicantly more work than here). > - It is straight forward, and not 'over engineered' or even 'highly over > engineered'. Straight forward doesn't mean "not over-engineered". I think I wouldn't claim "highly", but it's still "over-engineered" as I complained in my last mail. Over engineered to me means expending further effort on working around or solving a problem than the problem itself presents. In this case, yes, the implementation is simple. But still more complex than it needs to be, and inviting of still further complexity. > - The added code doesn't interfere with the existing code in a way that you > would experience side-effects, if you do not use it. > - There is no incidence that Wget is doing it's job worse than before. > - The new option adds value to Wget by making a core functionality tunable. I don't know why these points are being made, no one's arguing them. In particular, the middle point isn't really useful, because ANY new feature you add to Wget, no matter how you implement it, or even no matter how buggy it is, is always an improvement over previous versions of Wget which lacked the new feature in any form whatever. And all these statements apply to the proposal of "just use sh" at least as easily. No one's talking about this feature versus not this feature. The discussion so far is this feature versus a simpler (trivial, and also trivial-to-maintain) version of it, and one much more common to the Unix idiom at that. I'll respond to a couple of points Andrew made else-thread: (Andrew wrote): > Different systems have different shells. When you have to try to escape > for the system shell, you run into portability problems, and general > confusion regarding double-escaping. If you sit in freenode #openssh > for a while, you can see these problems routinely, resulting from the > fact that ssh remote commands are executed through the remote system > shell. Different UNIX systems have different system shells, all of which are sh-compatible. The quoting/escaping rules do not change, unless of course you are using an extended syntax such as bash's $'...', in which case you know what you're doing. The only system shell we might have to deal with having a truly different quoting syntax, would be Windows command shell, in the event that we port this feature for the Windows version (I imagine the implementation for piping to shell processes would be necessarily different, so wouldn't be immediately supported, if ever). (And as Dagobert pointed out, unlike openssh, you're always using your local system shell, with which you are presumably familiar.) > > > With sed, you still need -u, or else there is a deadlock. This > > > knowledge should be embedded into wget because most people don't > > > have it. > > > > You are talking about GNU sed, please keep in mind that wget is > > portable to systems without or just a subset of the GNU userland. > > Yes, I know. But those other sed implementations will probably not > work. They will just deadlock. To me, all of this is a strong argument that the default for any sh or sed protocol, should be to fork a new process for each name (regardless of which solution we go with). I'd far rather that, than exclude those with non-GNUish seds, or require embedding unportable constructs on the part of either Wget or Wget users. The tiny efficiency bonus you might see by streaming constantly to a process pales in comparison to the potential issues in supporting such a protocol. And it's worth noting that AFAICT, there IS no efficiency difference in the average case. If you fork and pipe and write and read with the process while the download is in operation (after you've obtained the final network name from redirections, of course), the average page download is going to take much longer than that whole operation, which can happen mostly in parallel while waiting for more network data to arrive. In solving the buffering problem, an alternative to forking/execing on every name, but one I personally like less, is to allocate a ptty around the program to force it to use line buffering even if it doesn't have an explicit option to do so. And such an option is obviously not implementable in Windows, if we do port this option there. Yet another alternative, sort of a compromise between the streamed and single-line-per-process approaches, would be to batch several names after we've collected them, send them all through and close the write end, and then collect the transformations from the program. It's worth pointing out that, if the alternative approach currently being approached - the CGIish content-type aware method - it would be necessary in all cases to fork/exec a new process for every transform, since each transformation would take place within a unique environment. -mjc
