Re: [Boston.pm] Ack-A-thon results (2)

Charlie Reitzel Wed, 16 Jul 2008 06:59:11 -0700

Thanks for the reports. Sorry I missed it, looks like a fun session. Thefollowing comments are a day late and a dollar short, as usual. So pleasetake the following comments with a grain of salt.

1) That code looks an awful lot like fgets() source. One concern is, bybypassing Perl I/O, you lose character set translation services. E.g.UTF-8 to UTF-16, or ISO-123456 to UTF-8 and so on. Can one safely dropi18n support these days? It _will_ be faster without it ...

2) Has anyone done raw I/O benchmarks against Perl I/O? In other contexts,I have compared Perl I/O against standard C I/O (both buffered andnon-buffered) and found _no_ difference. I have also compared standard CI/O vs. memory mapped files.

Of these schemes, memory mapped I/O was the winner, but not by very muchand then only for large files. Memory mapped I/O (MMIO) would beat thisscheme, I believe, only because it avoids a copy into the readbuffer. Note, to keep the speed advantage by avoiding swapping, you haveto take care to keep only a range of a large file committed at any giventime. But the MMIO mechanisms can be regarded as optimal for the given OS.

HTML Tidy, where MMIO is now the default, has a decent and portableimplementation. I actually argued against using it, in favor ofsimplicity. But others, who often work with large files, preferred theperformance improvement.

3) All that said, unless the input file is already in memory (e.g. /tmp/fooon Solaris) or you are working with especially complex patterns, I/O ismore than likely the bottleneck. Such mechanisms can be useful for tuningthe non-I/O portions of the code.



At 10:33 PM 7/15/2008 -0400, Bill Ricker wrote:

Uri showed us his table driven tests in
http://search.cpan.org/src/URI/Sort-Maker-0.06/t/
and his buffered line-reader in
http://search.cpan.org/src/URI/File-ReadBackwards-1.04/ReadBackwards.pm

We hacked on Read Backwards to make a read-forwards, that is start for
non-OO inner inner thing for Ack new loop.

-
Bill
[EMAIL PROTECTED] [EMAIL PROTECTED]
==================
#! perl  -w

use strict;

my $n=0;

my $is_crlf = 0;
my $lines_ref = [ ] ; # will be static

while (defined( my $line = our_readline(\*STDIN)))
{

  print ++$n,q{: },$line;
}


# read the /p/r/e/v/i/o/u/s/ record from the file
#


sub our_readline {

        my( $handle) = @_ ;

        my $text ;

# get the buffer of lines


        return unless $lines_ref ;

        while( 1 ) {

# see if there is more than 1 line in the buffer

                if ( @{$lines_ref} > 1 ) {

# we have a complete line so return it
# and convert those damned cr/lf lines to \n

                        $lines_ref->[-1] =~ s/\015\012/\n/
                                        if $is_crlf; # @TBD

                        return( shift @{$lines_ref} ) ;
                }

# we don't have a complete, so have to read blocks until we do


# @TBD -- EOF

# we have to read more text so get the handle and the current read size

                my $read_size = 4096; # @TBD variable


# read in the next (previous) block of text

                $text = @$lines_ref ? pop @$lines_ref : "" ;

my $read_cnt = sysread( $handle, $text, $read_size ,length($tex

t) ) ;

                if ($read_cnt == 0) {
                        my $buf=shift @$lines_ref;
                        $lines_ref=undef;
                        return $buf;
                }
# split the buffer into a list of lines
# this may want to be $/
# assumes newline separators

                @{$lines_ref} =
                        $text =~ /(.*?\n|.+)/gs ;

#print "Lines \n=>", join( "<=\n=>", @{$lines_ref} ), "<=\n" ;

        }
}

_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm



_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Ack-A-thon results (2)

Reply via email to