On 3/16/06, Jeff Pang <[EMAIL PROTECTED]> wrote:
>
> >
> >I'm havig problems getting information out of a file and having it write 
> >each of the dates to an array. The problem is, I don't want duplicates.
> >
>
> After reading your program carefully,I think you just want to get the count 
> increase for each uniq $data.
> In perl programming,HASH is very useful for this purpose.Because hash's key 
> is always uniq.
> I would modifiy your code here,and hope it helps to you.
>
> use strict;
> use warnings;
>
> my %uniq_data;
>
> while ( <> )  {
>         chomp;
>         my ($date, $time, $ip, $ssl, $cipher, $get, $pkg, $http, $pid, 
> $name1, $name2, $name3 ) = split;
>         $date =~ s/\[//;
>         $date =~ s/\// /g;
>         $date =~ s/\:(\d+):(\d+):(\d+)//;
>
>         $uniq_data{$date}++;
> }
>
>
> Then you could loop the HASH of %uniq_data to get each $data's count.
>
> --

As Jeff says, you want a hash here. But let's look at your regex for a second:

>         $date =~ s/\[//;     # get rid of the opening '['
>         $date =~ s/\// /g;     # replace '/' with ' '
>         $date =~ s/\:(\d+):(\d+):(\d+)//;     # get rid of every thing after 
> the first ':'

That should leave you with a string like '07 Feb 2005'. There are a
couple of things to note, here, especailly in your final substitution.
First, don't use capturing parenthesis unless you intend to do
something with the captures value (e.g. $1, $2, $3, etc.). Captuing
makes regexes much less efficient, noticably slower on lagre data
sets. If you're just using parens for grouping, use non-capturing
parens:

    $date =~ s/\:(?:\d+):(?:\d+):(?:\d+)//;

Here, though, you don't need to group at all. Perl treats the
metacharacter escape '\d' as a single character, so the following is
fine (the same goes for other class metas and any escaped character;
'\w', '\$', '\/', '\n', etc. are all single characters, as far as Perl
is concerned):

    $date =~ s/\:\d+:\d+:\d+//;

Next, ':' is not a metacharacter, you don't need to escape it.

    $date =~ s/:\d+:\d+:\d+//;

Finally, here you want to get rid of everything after the first colon.
Just do that:

    $date =~ s/:.*/;     # (a purist might want s/:.*$/)

You might also want to think about looking for what you do want out of
your regex, instead of spending so much effort getting rrid of what
you don't want. Something along the lines of

    $date =~ s#(\d{2})/(\w{3})/(\d{4}).*$#$1 $2 $3#;

HTH,

-- jay
--------------------------------------------------
This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com  http://www.dpguru.com  http://www.engatiki.org

values of β will give rise to dom!

Reply via email to