On 5/9/05, Uri Guttman <[EMAIL PROTECTED]> wrote: > >>>>> "BT" == Ben Tilly <[EMAIL PROTECTED]> writes: > > BT> On 5/9/05, Uri Guttman <[EMAIL PROTECTED]> wrote: > >> >>>>> "BT" == Ben Tilly <[EMAIL PROTECTED]> writes: > >> > BT> Be aware that IO::Tee has limitations. It only works for output that > BT> goes through Perl's IO system. In particular if your program makes > BT> a system call, the child process will NOT see the tee. > >> > >> i bet you can work around that by saving STDOUT, reopening it on IO::Tee > >> and having IO::Tee output to the file and the saved STDOUT. i leave > >> implementing this as an exercise to the reader. but using shell to do > >> this is probably the easiest as you can just use tee and all stdout > >> piped to it (from the perl program or its subprocesses) will get > >> teed. as larry says, the shell has to be useful for something! > > BT> You'd lose that bet. > > i am not sure about that. it might need some hacking to do it. > > BT> IO::Tee is implemented through tying a filehandle inside of Perl. > BT> The entire mechanism only makes sense from within Perl. A > BT> launched subprocess (or poorly written XS code) goes through a > BT> fileno that the operating system knows about. Since the OS does > BT> not know about Perl's abstractions of I/O, there is no way to get > BT> the OS to direct output through them. > > you can then do the STDOUT dup stuff yourself and then bind IO::Tee to > that. by closing STDOUT and reopening it to a pipe you create, all the > children process will output to that pipe since they will see it as fd > 0. you have to fork and have that read the other side of the pipe and > use IO::Tee in there. like i said, not simple but doable. this is > effectively what the shell does when you pipe anyway.
This is just a version of the alternate "fork and postprocess" that I said would work (and you left out of your reply). But if you're going to do that, then IO::Tee is a red herring - it is easier to loop over filehandles yourself. The heavy lifting is being done by the operating system. See the cookbook for a sample implementation. > another totally different approach is to use one of my perl sayings, > print rarely, print late. too much code is written with direct calls to > print (with or without explicit handles). when you print late, you just > build up all your output in strings with .= and then just return it to > the caller. only at the highest level where the actual print decisions > are really made do you finally call print. this is also faster as print > is very slow as it invokes all manner of stdio/perlio code each time it > is called. appending to a buffer is very fast and clean. so if you did > it this way, the top level would be like: We're now getting into optimization, so this is platform dependent. Besides, optimization First of all be aware that while .= is fast in Perl, in many other high-level languages the equivalent is slow. For instance try to create the string "hello world\n"x1_000_000 with a simple appending loop in Perl, JavaScript, Ruby, Java and Python. Using the default string implementation this is very slow in every language but Perl. Making it fast requires jumping through various sets of hoops. How many and which ones depends on the language. Java has a StringBuffer class that does the trick. In JavaScript you can accumulate strings in an array and then join it. Unfortunately if the array gets too big then you run into GC overhead. So then you have to start accumulating into an array and joining parts of the array early. (Ugh.) Secondly even in Perl I'd expect print to be faster than using .= repeatedly instead of print. Let's try it: $ time perl -e 'print "hello world\n" for 1..1_000_000' > /dev/null real 0m0.379s user 0m0.380s sys 0m0.000s $ time perl -e '$s .= "hello world\n" for 1..1_000_000; print $s' > /dev/null real 0m0.752s user 0m0.600s sys 0m0.150s $ perl -v This is perl, v5.8.4 built for i386-linux-thread-multi [...] Why did this happen? Well when you print, most of the time what it does is shove the data on a buffer. If said buffer passes over some threshold (eg 2 K) then it actually writes it to the pipe. All of your output has to go through this process, so adding a level of Perl buffering is pure overhead. Having to buffer all of it is more overhead still. (Incidentally in this case, syswrite is slightly faster than print.) Or at least *should* be. In older Perl's by default you went through the OS stdio stuff, and the hand-off from Perl to the OS could be slow. Depending on your platform, that is. (Linux was slow IIRC.) So you may have once done a benchmark and made an optimization conclusion and then never noticed that it has now become dated. (This has happened to me plenty of times...) > use File::Slurp ; > > my $text = do_lots_of_work_and_return_all_the_text() ; > > print $text ; > write_file( $tee_file, $text ) if $tee_file ; > > it makes for a very good api too in all the other subs. most just return > text and don't do any output themselves. then you can use the subs in > any way you want, for output, sending a message, log entries, > etc. printing at the point of text generation makes this impossible. Maintainability is more important than optimization. I often use this strategy for maintainance reasons. Going full-cycle, one way to accomplish all of this without changing code is to tie to a filehandle that accumulates data and prints it later. > as for subprocesses, you just use backticks instead of system and > collect the output. > > the one downside is when you need output to be flushed such as when you > are working with pipes. this can be handled too by just calling syswrite > at the proper places and having the lower level subs return text as > before. isolating content generation from its destination is a good > design idea that isn't used enough. Which is close to being a variation of the problem that I mentioned with IO::Tee. And the same solution works there as well. Cheers, Ben _______________________________________________ Boston-pm mailing list [email protected] http://mail.pm.org/mailman/listinfo/boston-pm

