So, recently I ended up having to write a whole lot of shell (bash), and it sucked. However, it seemed like a necessity for that task because bash (and most other ksh-family shells) has a killer feature - process substitution.
If you're not familiar with it, here's an example: echo "compressed file attached" | mutt -a <(gzip -c /some/file) -s "that data you wanted" -- [email protected] This one line emails a compressed copy of /some/file to Joe. But notice how we compress the file in-line? Bash process substitution works line this: bash sees the process-substitution syntax, the <(...) part, and knows to create a temporary fifo, which will be written to by gzip's stdout. Before running mutt, it substitutes the path to the fifo in place of the <(...) command. No need for you to manage a temp file on your own, and multiple files can be compressed simultaneously if you're attaching more than one! I know, pretty slick, right? You can do it for output as well! cat /some/huge/file \ | tee \ >(gzip -c /some/huge/file.gz) \ >(md5 > /home/huge/file.md5) \ >(shasum -a 256 > /home/huge/file.sha) \ >/dev/null The huge file is read only once, and your multi-core box is hashing and compressing simultaneously! At some point I went looking on the CPAN for something that would let me do this sort of thing easily in Perl, but sadly, to no avail. I didn't want to bother with anything too complicated, and I wanted simple, clean syntax. OO would be OK, but not preferable unless really done right. So, I searched high and low and found... Nothing. Of course there were modules that had a lot of potential, and certainly things that could be used as the foundation for building the functionality on my own, but nothing that allowed me to express shell-like process substitution out-of-the-box! So I started writing it myself... I was inspired by the API and code in IPC::Pipeline and so I stole much of it for my own and built upon that. So.. I now have a working implementation! However, I'm not sure about the interface/API I've created. I would definitely like some feedback on the examples below, and any ideas people might have on how to improve it or make it more flexible. This is important because at this point I'm ready to add some more features, and I want to build out my test suite, but having to re-write all my tests because the API needs to change would just plain suck. Here's an example of the email shell example from above, but using the API I have right now: pipeline( sub { print "compressed file attached" }, ['mutt', '-a', procsub('<', 'gzip', '-c', $some_file), '-s', 'that data you wanted', '--', '[email protected]'], )->run(); Another way to do it: pipe my ($r, $w); pipeline( ['mutt', '-a', procsub('<', 'gzip', '-c', $some_file), '-s', 'that data you wanted', '--', '[email protected]'] )->run( source => $r ); print $w "compressed file attached"; close $w; One last version: my $gzfile = procsub('<', 'gzip', '-c', $some_file); # does not run, yet my $pl = pipeline( sub { print "compressed file attached" }, ['mutt', '-a', $gzfile, '-s', 'that data you wanted', '--', '[email protected]']); $pl->run(); As you can see, it's not as concise as the shell version but you do gain one big plus - everything here is *composable* Here's another, somewhat contrived example: my $pl = pipeline( [qw(gzip -c), procsub( '<', pipeline( [qw(head -n 5) $0], [qw(tr A-Z N-ZA-M)])], [qw(gunzip -c)], [qw(tr N-ZA-M A-Z)] ); $pl->run(sink => \*STDERR); # print first 5 lines of script to STDERR my %pidinfo = $pl->pidinfo; # pids and info about them And here's that last example, but with a currently-imaginary overloading of the | operator... my $pl = pproc( qw(gzip -c), procsub( '<', pipeline( pproc( qw(head -n 5) $0) | pproc( qw(tr A-Z N-ZA-M)))) | pproc( qw(gunzip -c)) | pproc( qw(tr N-ZA-M A-Z))); $pl->run(sink => \*STDERR); # print first 5 lines of script to STDERR my %pidinfo = $pl->pidinfo; # pids and info about them Please note that while I like the idea of using | I *really* think the above example looks hideous! So, I'm looking for comments on the API, and what others would like to see. Should it be more OO? Should it be more functional? (it already is in a fairly functional style) Is this still too much extra syntax and overhead to be useful? Would you prefer different function names, different arguments? specific features? (I'm planning on implementing named placeholders and an equivalent of 'tee') Forget about how *useful* this might be to *you*, I'm writing this partially because *I* needed it, and partially to help me become a better programmer. The code is already on github but once I commit to an API it will be bound for the CPAN. -- -- Steve Scaffidi <[email protected]> _______________________________________________ Boston-pm mailing list [email protected] http://mail.pm.org/mailman/listinfo/boston-pm

