Thanks for the reports. Sorry I missed it, looks like a fun session. The following comments are a day late and a dollar short, as usual. So please take the following comments with a grain of salt.

1) That code looks an awful lot like fgets() source. One concern is, by bypassing Perl I/O, you lose character set translation services. E.g. UTF-8 to UTF-16, or ISO-123456 to UTF-8 and so on. Can one safely drop i18n support these days? It _will_ be faster without it ...

2) Has anyone done raw I/O benchmarks against Perl I/O? In other contexts, I have compared Perl I/O against standard C I/O (both buffered and non-buffered) and found _no_ difference. I have also compared standard C I/O vs. memory mapped files.

Of these schemes, memory mapped I/O was the winner, but not by very much and then only for large files. Memory mapped I/O (MMIO) would beat this scheme, I believe, only because it avoids a copy into the read buffer. Note, to keep the speed advantage by avoiding swapping, you have to take care to keep only a range of a large file committed at any given time. But the MMIO mechanisms can be regarded as optimal for the given OS.

HTML Tidy, where MMIO is now the default, has a decent and portable implementation. I actually argued against using it, in favor of simplicity. But others, who often work with large files, preferred the performance improvement.

3) All that said, unless the input file is already in memory (e.g. /tmp/foo on Solaris) or you are working with especially complex patterns, I/O is more than likely the bottleneck. Such mechanisms can be useful for tuning the non-I/O portions of the code.


At 10:33 PM 7/15/2008 -0400, Bill Ricker wrote:
Uri showed us his table driven tests in
http://search.cpan.org/src/URI/Sort-Maker-0.06/t/
and his buffered line-reader in
http://search.cpan.org/src/URI/File-ReadBackwards-1.04/ReadBackwards.pm

We hacked on Read Backwards to make a read-forwards, that is start for
non-OO inner inner thing for Ack new loop.

-
Bill
[EMAIL PROTECTED] [EMAIL PROTECTED]
==================
#! perl  -w

use strict;

my $n=0;

my $is_crlf = 0;
my $lines_ref = [ ] ; # will be static

while (defined( my $line = our_readline(\*STDIN)))
{

  print ++$n,q{: },$line;
}


# read the /p/r/e/v/i/o/u/s/ record from the file
#


sub our_readline {

        my( $handle) = @_ ;

        my $text ;

# get the buffer of lines


        return unless $lines_ref ;

        while( 1 ) {

# see if there is more than 1 line in the buffer

                if ( @{$lines_ref} > 1 ) {

# we have a complete line so return it
# and convert those damned cr/lf lines to \n

                        $lines_ref->[-1] =~ s/\015\012/\n/
                                        if $is_crlf; # @TBD

                        return( shift @{$lines_ref} ) ;
                }

# we don't have a complete, so have to read blocks until we do


# @TBD -- EOF

# we have to read more text so get the handle and the current read size

                my $read_size = 4096; # @TBD variable


# read in the next (previous) block of text

                $text = @$lines_ref ? pop @$lines_ref : "" ;
my $read_cnt = sysread( $handle, $text, $read_size , length($tex
t) ) ;

                if ($read_cnt == 0) {
                        my $buf=shift @$lines_ref;
                        $lines_ref=undef;
                        return $buf;
                }
# split the buffer into a list of lines
# this may want to be $/
# assumes newline separators

                @{$lines_ref} =
                        $text =~ /(.*?\n|.+)/gs ;

#print "Lines \n=>", join( "<=\n=>", @{$lines_ref} ), "<=\n" ;

        }
}

_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm


_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to