Jon Hans wrote: > > #!/usr/bin/perl > ####################################################### > > I am trying to find all of the reoccurring sequences > excluding the sub sequences. > > Maybe I am missing the obvious, but having a little > perl exposure and not being an expert perl programmer > I have hacked together some code that does some of > what I would like to do, but I know that there must be > a much better way of doing this. I just don't have any > ideas right now, having only had a couple hours sleep > in the last couple of days. :+( am I looking at this > all wrong? There should be some regular expression(s) > that would make this more maintainable and elegant. > :-) > > I have used an array of items called @datalist and a > hash called %frequency that has a count of how often > each item occurs in the data list. I used tr to clean > the data of special characters if any and split on > white space into the @datalist array. > > I would appreciate some help with this. Thanks
I can't tell exactly what you are trying to do. Do you have any examples of the original data and what you want the modified data to look like? > ####################################################### > > # find frequency of all sequences of the given size > my $count = $first = $currentseq = 0; my $count = my $first = my $currentseq = 0; > # size of sequence to look for > my $sizeof = 10; > > while ($first + $sizeof < $#datalist) { > > #ugly > if ( defined $frequency{$datalist[$first]} && > defined $frequency{$datalist[$first+1]} && > $frequency{$datalist[$first+2]} && > $frequency{$datalist[$first+3]} && > $frequency{$datalist[$first+4]} && > $frequency{$datalist[$first+5]} && > $frequency{$datalist[$first+6]} && > $frequency{$datalist[$first+7]} && > $frequency{$datalist[$first+8]} && > $frequency{$datalist[$first+9]} ) { if ( (grep defined $frequency{ $_ }, @datalist[ $first .. $first + 9 ]) == 10 ) { > # put a sequence together with a space separating > items > $currentseq .= $datalist[ $first ] ; You initialized $currentseq with "0" earlier. Did you really want it to start with "0"? > for (my $count = 1; $count < $sizeof; ++$count) > { > $currentseq .= " " . $datalist[ $first + > $count ] ; > } $currentseq = join ' ', @datalist[ $first .. $first + 9 ]; > # increment count of sequence for the current one > ++$current{ $currentseq }; > } > # next position in the data list > ++$first; > } > > foreach ( keys ( %current ) ) { for my $currentsequence ( keys %current ) { > # if no multiples remove sequence > if ( $current{ $_ } < 2 ) { > delete $current{ $_ } ; > } delete $current{ $currentsequence } if $current{ $currentsequence } < 2; > > my $currentsequence = $_ ; > my $numberof = $current{ $_ } ; > > foreach ( keys ( %lastseq ) ) { > # if the number of times the smaller sequence occurs > is # the same, then the shorter sequence is not needed > if ( grep($_,$currentsequence) && $lastseq{ $_ } == $numberof ) { ^^^^^^^^^^^^^^^^^^^^^^^^^ grep() operates on a LIST not a scalar if ( $currentsequence && $lastseq{ $_ } == $current{ $currentsequence } ) { > delete $lastseq{ $_ } ; > } > } > } John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]