> -----Original Message-----
> From: Debbie Christensen [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, July 24, 2001 3:03 PM
> To: [EMAIL PROTECTED]
> Subject: reading a text file
>
>
> I am brand new to perl; I am only on chapt 4 of the learning
> perl book. My boss has already given me a project to do that
> I am really struggling with. I know you are all really busy,
> but I would really appreciate any help you can give.
>
> I have a text file that looks something like this
>
> OH: 702
> PA: 702
> ND: 702
> NJ :703
> NY: 703
> Ca: 703
>
> ...
>
> I am able to open the file and read it with no problem.
> Where I get lost is My boss wants the data to come out like
> 702 OH, PA, ND
> 703 NJ, NY, CA
Hey, this is a really common type of thing and perl excels at this. The
trick is to use a hash to group the states together by the keys. A question
is whether a given state, number pair can occur more than once on the input
and whether you would want to show it only once on the output. Also, it
appears that you want the states to be listed on the output in the order
they appeared in the input (as opposed to being sorted alphabetically, for
example).
Here's a solution that will produce the output you showed:
#/usr/bin/perl -w
use strict;
my %d; # hash to accumulate data in
while(<>) # read line from file(s) specified on cmd line, or
STDIN
{
chomp; # strip line terminator from $_
my ($state, $key) = split /\s*:\s*/; # extract state and number
push @{$d{key}}, $state; # add to list of state for this
num
}
# print the output data
print "$_ ", join(', ', @{$d{$_}}), "\n" for sort { $a <=> $b } keys %d;
Some of those lines need explanation:
my ($state, $key) = split /\s*:\s*/;
This is an assignment statement setting a list of variables from a list of
values on the right-hand side. The split function takes an input string
(here it is implied as the $_ variable) and splits it into a list of
multiple values based on a delimiter pattern (regular expression). I am
using a delimiter consisting of zero or more whitespace chars, followed by a
colon, followed by zero or more whitespace chars. So, if the line we just
read in was:
OH: 702
Then $_ would contain "OH: 702\n" before the chomp; and "OH: 702" after
the chomp. The split function would see the ": " sequence as matching the
delimiter and split the string into two values, "OH" and "702". The first
value would be assinged to $state and the second to $key.
push @{$d{$key}}, $state;
This is a call to the push() function, which takes two arguments: an array,
and a single (scalar) value. The value is added to the end of the array.
What is the array? The array is an anonymous (unnamed) array which is
referenced by the value of $d{$key}, which is a hash entry of the hash %d.
So this statement means: "Add the value of $state to the end of the array
pointed to by the hash entry $d{$key}"
Note that if the hash entry $d{$key} doesn't exist, it is automatically
created and initialized to point to an empty array when this statement is
executed, which makes our life simpler. So after executing this statement
with the above values, the hash %d would contain one entry with a key value
of "702". This entry would point to an anonymous array containing one entry,
the value "OH".
print "$_ ", join(', ', @{$d{$_}}), "\n" for sort { $a <=> $b } keys %d;
This one statement prints the entire output table. What you have is a
statement of the form:
print .... for .....
The "for" is a statement modifier that basically causes the "print"
statement to execute multiple times. Following the "for" keyword is a list
of values. The print statement will be executed once for each value in the
list, and during each execution, the $_ variable will be set to the
particular value in the list being processed.
What is the list of values after the "for"? It is the list produced by the
expression
sort { $a <=> $b } keys %d;
This is taking the result of the expression "keys %d", which is the list of
key values from the %d hash (e.g. the numbers 703, 702, etc.) and feeding
them to the sort function. The sort function is using the expression $a <=>
$b to compare the keys. This will have the effect of sorting the keys as
numbers, instead of the default strings. (This will make "1001" come after
"702" instead of before).
So the "for" modifier will execute the print statement, setting $_ to the
numbers in your list successively. For the input data you gave, the print
statement will execute twice, first for 702, then for 703.
So the print statement prints each line. What does it print?
print "$_ ", join(', ', @{$d{$_}}), "\n"
There are three values printed: $_, which is the number (e.g. 702). Next
comes a join() function call, then comes a "\n" (newline) to mark the end of
the line. The join() function is the opposite of split(): it takes a list of
values (in this case the list of states in the array pointed to by the hash
entry corresponding to the current number being processed ($_). These values
are strung together with a ', ' sequence in between each. This gives you the
"OH, PA, ND" sequence needed.
Whew!
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]