So, recently I ended up having to write a whole lot of shell (bash),
and it sucked. However, it seemed like a necessity for that task
because bash (and most other ksh-family shells) has a killer feature -
process substitution.

If you're not familiar with it, here's an example:

echo "compressed file attached" | mutt -a <(gzip -c /some/file) -s
"that data you wanted" -- [email protected]

This one line emails a compressed copy of /some/file to Joe. But
notice how we compress the file in-line? Bash process substitution
works line this: bash sees the process-substitution syntax, the <(...)
part, and knows to create a temporary fifo, which will be written to
by gzip's stdout. Before running mutt, it substitutes the path to the
fifo in place of the <(...) command. No need for you to manage a temp
file on your own, and multiple files can be compressed simultaneously
if you're attaching more than one!

I know, pretty slick, right? You can do it for output as well!

  cat /some/huge/file \
  | tee \
    >(gzip -c /some/huge/file.gz) \
    >(md5 > /home/huge/file.md5) \
    >(shasum -a 256 > /home/huge/file.sha) \
  >/dev/null

The huge file is read only once, and your multi-core box is hashing
and compressing simultaneously!

At some point I went looking on the CPAN for something that would let
me do this sort of thing easily in Perl, but sadly, to no avail. I
didn't want to bother with anything too complicated, and I wanted
simple, clean syntax. OO would be OK, but not preferable unless really
done right. So, I searched high and low and found... Nothing.

Of course there were modules that had a lot of potential, and
certainly things that could be used as the foundation for building the
functionality on my own, but nothing that allowed me to express
shell-like process substitution out-of-the-box!

So I started writing it myself... I was inspired by the API and code
in IPC::Pipeline and so I stole much of it for my own and built upon
that.

So.. I now have a working implementation! However, I'm not sure about
the interface/API I've created. I would definitely like some feedback
on the examples below, and any ideas people might have on how to
improve it or make it more flexible. This is important because at this
point I'm ready to add some more features, and I want to build out my
test suite, but having to re-write all my tests because the API needs
to change would just plain suck.

Here's an example of the email shell example from above, but using the
API I have right now:

  pipeline(
      sub { print "compressed file attached" },
      ['mutt', '-a', procsub('<', 'gzip', '-c', $some_file), '-s',
'that data you wanted', '--', '[email protected]'],
  )->run();

Another way to do it:

  pipe my ($r, $w);
  pipeline(
      ['mutt', '-a', procsub('<', 'gzip', '-c', $some_file), '-s',
'that data you wanted', '--', '[email protected]']
  )->run( source => $r );
  print $w "compressed file attached";
  close $w;

One last version:

  my $gzfile = procsub('<', 'gzip', '-c', $some_file); # does not run, yet
  my $pl = pipeline(
      sub { print "compressed file attached" },
      ['mutt', '-a', $gzfile, '-s', 'that data you wanted', '--',
'[email protected]']);
  $pl->run();


As you can see, it's not as concise as the shell version but you do
gain one big plus - everything here is *composable*

Here's another, somewhat contrived example:

  my $pl = pipeline(
      [qw(gzip -c), procsub( '<', pipeline(
          [qw(head -n 5) $0],
          [qw(tr A-Z N-ZA-M)])],
      [qw(gunzip -c)],
      [qw(tr N-ZA-M A-Z)]
  );
  $pl->run(sink => \*STDERR); # print first 5 lines of script to STDERR
  my %pidinfo = $pl->pidinfo; # pids and info about them


And here's that last example, but with a currently-imaginary
overloading of the | operator...

  my $pl =
      pproc( qw(gzip -c), procsub( '<', pipeline(
          pproc( qw(head -n 5) $0) | pproc( qw(tr A-Z N-ZA-M)))) |
      pproc( qw(gunzip -c)) |
      pproc( qw(tr N-ZA-M A-Z)));
  $pl->run(sink => \*STDERR); # print first 5 lines of script to STDERR
  my %pidinfo = $pl->pidinfo; # pids and info about them

Please note that while I like the idea of using | I *really* think the
above example looks hideous!

So, I'm looking for comments on the API, and what others would like to
see. Should it be more OO? Should it be more functional? (it already
is in a fairly functional style) Is this still too much extra syntax
and overhead to be useful? Would you prefer different function names,
different arguments? specific features? (I'm planning on implementing
named placeholders and an equivalent of 'tee')

Forget about how *useful* this might be to *you*, I'm writing this
partially because *I* needed it, and partially to help me become a
better programmer. The code is already on github but once I commit to
an API it will be bound for the CPAN.

-- 
-- Steve Scaffidi <[email protected]>

_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to