Re: [Boston.pm] Simultaneous redirect to STDOUT & File?

Ben Tilly Tue, 10 May 2005 08:55:19 -0700

On 5/9/05, Uri Guttman <[EMAIL PROTECTED]> wrote:
> >>>>> "BT" == Ben Tilly <[EMAIL PROTECTED]> writes:
> 
>   BT> On 5/9/05, Uri Guttman <[EMAIL PROTECTED]> wrote:
>   >> >>>>> "BT" == Ben Tilly <[EMAIL PROTECTED]> writes:
>   >>
>   BT> Be aware that IO::Tee has limitations.  It only works for output that
>   BT> goes through Perl's IO system.  In particular if your program makes
>   BT> a system call, the child process will NOT see the tee.
>   >>
>   >> i bet you can work around that by saving STDOUT, reopening it on IO::Tee
>   >> and having IO::Tee output to the file and the saved STDOUT. i leave
>   >> implementing this as an exercise to the reader. but using shell to do
>   >> this is probably the easiest as you can just use tee and all stdout
>   >> piped to it (from the perl program or its subprocesses) will get
>   >> teed. as larry says, the shell has to be useful for something!
> 
>   BT> You'd lose that bet.
> 
> i am not sure about that. it might need some hacking to do it.
> 
>   BT> IO::Tee is implemented through tying a filehandle inside of Perl.
>   BT> The entire mechanism only makes sense from within Perl.  A
>   BT> launched subprocess (or poorly written XS code) goes through a
>   BT> fileno that the operating system knows about.  Since the OS does
>   BT> not know about Perl's abstractions of I/O, there is no way to get
>   BT> the OS to direct output through them.
> 
> you can then do the STDOUT dup stuff yourself and then bind IO::Tee to
> that. by closing STDOUT and reopening it to a pipe you create, all the
> children process will output to that pipe since they will see it as fd
> 0. you have to fork and have that read the other side of the pipe and
> use IO::Tee in there. like i said, not simple but doable. this is
> effectively what the shell does when you pipe anyway.


This is just a version of the alternate "fork and postprocess" that I
said would work (and you left out of your reply).  But if you're going
to do that, then IO::Tee is a red herring - it is easier to loop over
filehandles yourself.  The heavy lifting is being done by the
operating system.

See the cookbook for a sample implementation.

> another totally different approach is to use one of my perl sayings,
> print rarely, print late. too much code is written with direct calls to
> print (with or without explicit handles). when you print late, you just
> build up all your output in strings with .= and then just return it to
> the caller. only at the highest level where the actual print decisions
> are really made do you finally call print. this is also faster as print
> is very slow as it invokes all manner of stdio/perlio code each time it
> is called. appending to a buffer is very fast and clean. so if you did
> it this way, the top level would be like:

We're now getting into optimization, so this is platform
dependent.  Besides, optimization

First of all be aware that while .= is fast in Perl, in many other
high-level languages the equivalent is slow.  For instance try to
create the string "hello world\n"x1_000_000 with a simple
appending loop in Perl, JavaScript, Ruby, Java and Python.
Using the default string implementation this is very slow in
every language but Perl.  Making it fast requires jumping
through various sets of hoops.  How many and which ones
depends on the language.  Java has a StringBuffer class that
does the trick.  In JavaScript you can accumulate strings in an
array and then join it.  Unfortunately if the array gets too big
then you run into GC overhead.  So then you have to start
accumulating into an array and joining parts of the array early.
(Ugh.)

Secondly even in Perl I'd expect print to be faster than using .=
repeatedly instead of print.   Let's try it:

$ time perl -e 'print "hello world\n" for 1..1_000_000' > /dev/null

real    0m0.379s
user    0m0.380s
sys     0m0.000s

$ time perl -e '$s .= "hello world\n" for 1..1_000_000; print $s' > /dev/null

real    0m0.752s
user    0m0.600s
sys     0m0.150s

$ perl -v

This is perl, v5.8.4 built for i386-linux-thread-multi
[...]

Why did this happen?  Well when you print, most of the time what it does
is shove the data on a buffer.  If said buffer passes over some threshold
(eg 2 K) then it actually writes it to the pipe.  All of your output has to go
through this process, so adding a level of Perl buffering is pure
overhead.  Having to buffer all of it is more overhead still.  (Incidentally
in this case, syswrite is slightly faster than print.)

Or at least *should* be.  In older Perl's by default you went through the
OS stdio stuff, and the hand-off from Perl to the OS could be slow.
Depending on your platform, that is.  (Linux was slow IIRC.)  So you may
have once done a benchmark and made an optimization conclusion
and then never noticed that it has now become dated.  (This has
happened to me plenty of times...)

>         use File::Slurp ;
> 
>         my $text = do_lots_of_work_and_return_all_the_text() ;
> 
>         print $text ;
>         write_file( $tee_file, $text ) if $tee_file ;
> 
> it makes for a very good api too in all the other subs. most just return
> text and don't do any output themselves. then you can use the subs in
> any way you want, for output, sending a message, log entries,
> etc. printing at the point of text generation makes this impossible.

Maintainability is more important than optimization.  I often use this
strategy for maintainance reasons.  Going full-cycle, one way to
accomplish all of this without changing code is to tie to a filehandle
that accumulates data and prints it later.

> as for subprocesses, you just use backticks instead of system and
> collect the output.
> 
> the one downside is when you need output to be flushed such as when you
> are working with pipes. this can be handled too by just calling syswrite
> at the proper places and having the lower level subs return text as
> before. isolating content generation from its destination is a good
> design idea that isn't used enough.

Which is close to being a variation of the problem that I mentioned
with IO::Tee.  And the same solution works there as well.

Cheers,
Ben
 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Simultaneous redirect to STDOUT & File?

Reply via email to