On 2022-01-15 00:04, Jon Smart wrote:

Hello Paul

Do you mean by undef $/ and with <$fh> we can read the file into memory at one time?

$/ is the input record separator, newline by default.
If undefined that means that the whole file is treated as one single record.


Yes that would be faster b/c we don't need to read file by each line, which increases the disk IO.

Another questions:
1. what's the "truss" command?

truss is a "unix" command.  Search for "truss unix".


2. what's the syntax "<:mmap"?

mmap enables a memory mapped file. I treats the file as if it were a chunk of memory instead of a file.

Depending on the size of the file and the amount of memory available it may not make a difference. Benchmark to confirm.


On 15.01.2022 15:45, Paul Procacci wrote:

-------------------------------------
use strict;

$/ = undef;

That is usually written as:

local $/;

because $/ is a global variable and you want to limit the scope of any change.


my %stopwords = do {
        open my $fh, '<:mmap', 'stopwords.txt' or die $!;
        map { $_ => 1; } split /\n/, <$fh>;
};

my %count = do {
        my %res;
        open my $fh, '<:mmap', 'words.txt' or die $!;
        map { $res{$_}++ unless $stopwords{$_}; } split /\n/, <$fh>;
        %res;
};

my $i=0;
for (sort {$count{$b} <=> $count{$a}} keys %count) {
    if ($i < 20) {
        print "$_ -> $count{$_}\n"
    } else {
       last;
    }
    $i ++;
}



John

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to