Re: using regular expressions to find sequences of items in data

John W. Krahn Sun, 06 Jan 2002 04:21:06 -0800

Jon Hans wrote:
> 
> #!/usr/bin/perl
> #######################################################
> 
> I am trying to find all of the reoccurring sequences
> excluding the sub sequences.
> 
> Maybe I am missing the obvious, but having a little
> perl exposure and not being an expert perl programmer
> I have hacked together some code that does some of
> what I would like to do, but I know that there must be
> a much better way of doing this. I just don't have any
> ideas right now, having only had a couple hours sleep
> in the last couple of days. :+( am I looking at this
> all wrong? There should be some regular expression(s)
> that would make this more maintainable and elegant.
> :-)
> 
> I have used an array of items called @datalist and a
> hash called %frequency that has a count of how often
> each item occurs in the data list. I used tr to clean
> the data of special characters if any and split on
> white space into the @datalist array.
> 
> I would appreciate some help with this. Thanks


I can't tell exactly what you are trying to do.  Do you
have any examples of the original data and what you want
the modified data to look like?


> #######################################################
> 
> # find frequency of all sequences of the given size
> my $count = $first = $currentseq = 0;

my $count = my $first = my $currentseq = 0;

> # size of sequence to look for
> my $sizeof = 10;
> 
> while ($first + $sizeof < $#datalist) {
> 
> #ugly
>    if ( defined $frequency{$datalist[$first]} &&
> defined $frequency{$datalist[$first+1]} &&
> $frequency{$datalist[$first+2]} &&
> $frequency{$datalist[$first+3]} &&
> $frequency{$datalist[$first+4]} &&
> $frequency{$datalist[$first+5]} &&
> $frequency{$datalist[$first+6]} &&
> $frequency{$datalist[$first+7]} &&
> $frequency{$datalist[$first+8]} &&
> $frequency{$datalist[$first+9]} ) {

    if ( (grep defined $frequency{ $_ }, @datalist[ $first .. $first + 9 ]) == 10 ) {


> # put a sequence together with a space separating
> items
>       $currentseq .= $datalist[ $first  ] ;

You initialized $currentseq with "0" earlier.  Did you
really want it to start with "0"?


>       for (my $count = 1; $count < $sizeof; ++$count)
> {
>          $currentseq .= " " . $datalist[ $first +
> $count ] ;
>       }

    $currentseq = join ' ', @datalist[ $first .. $first + 9 ];


> # increment count of sequence for the current one
>       ++$current{ $currentseq };
>    }
> # next position in the data list
>    ++$first;
> }
> 
> foreach ( keys ( %current ) ) {

for my $currentsequence ( keys %current ) {


> # if no multiples remove sequence
>    if ( $current{ $_ } < 2 ) {
>         delete $current{ $_ } ;
>    }

     delete $current{ $currentsequence } if $current{ $currentsequence } < 2;

> 
>    my $currentsequence = $_ ;
>    my $numberof = $current{ $_ } ;
> 
>    foreach ( keys ( %lastseq ) ) {
> # if the number of times the smaller sequence occurs
> is # the same, then the shorter sequence is not needed
>       if ( grep($_,$currentsequence) && $lastseq{ $_ } == $numberof ) {
             ^^^^^^^^^^^^^^^^^^^^^^^^^
grep() operates on a LIST not a scalar

        if ( $currentsequence && $lastseq{ $_ } == $current{ $currentsequence } ) {

>          delete $lastseq{ $_ } ;
>       }
>    }
> }



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: using regular expressions to find sequences of items in data

Reply via email to