Re: multidimensional array check

Rob Coops Fri, 04 Mar 2011 04:29:13 -0800

Hi Vito,

Ok so lets give this a try...


First lets deal with comparing the arrays then we can get to the size issues
and memory usage etc...

You have the data in an array which is good but not all that nice as after
all it will make for a lot of work if you want to compare two arrays (you
end up having to compare all elements in array 1 with all elements in array
2 which means with 100 elements in each array already 100x100 = 10000
comparisons which if your data sets grow will only get worse.

So lets use a hash instead as this allows for a key value and used in a
intelligent way will save you a lot of comparing. Here is what I would do:

use strict;
use warnings;

my @G1 = (["alfa" ,  "10"], ["beta" ,  "11"]);
my @L1 = (["alfa" ,   "10"], ["gamma" ,  "12"]);
my @unique;
my @overlap;

my %hash;
foreach my $result ( @G1 ) {
 my $key = join '*|*', @{$result}; # Makes ["alfa", "10"] into "alfa*|*10"
                     # @{$result} tells perl that $result contains an array
otherwise this whole trick would not work ;)
 $hash{ $key }++;
}

# We now see in %hash the following key value pairs "alfa*|*10" => 1 and
"beta*|*11" => 1
# all that is left is adding to the the other array and we will find the
over lap automatically

# Since we will never use @G1 again we can drop this
undef @G1; # Basically sets @G1 to undef and lets perl know we will never
use this data again so it is free to clean up the memory


foreach my $result ( @L1 ) {
 my $key = join "*|*", @{$result}; #makes ["alfa", "10"] into "alfa*|*10"
                     # @{$result} tells perl that $result contains an array
otherwise this whole trick would not work ;)
 $hash{ $key }++;
}
# %hash will now contain "alfa*|*10" => 2 and "beta*|*11" => 1
# finding out which once are unique is easy we simply remove all that have a
value greater then 1

# Since we will never use @L1 again we can drop this
undef @L1; # Basically sets @L1 to undef and lets perl know we will never
use this data again so it is free to clean up the memory

foreach my $key ( keys %hash ) {
 if ( $hash{ $key } == 1 ) { # This key only appeared once
  my @array = split ( /\*\|\*/, $key ); # Pull the key apart into the two
values (or 200k values if you have that many columns o course)
  # Now we add this to the @unique array
  push @unique, [@array];
 } else { # This is where we deal with the over lapping results
  my @array = split /\*\|\*/, $key; # Pull the key apart into the two values
(or 200k values if you have that many columns o course)
  # Now we add this to the @overlap array
  push @overlap, [@array];
 }
}

use Data::Dumper;
print "Overlap\n";
print Dumper @overlap;
print "Unique\n";
print Dumper @unique;

Thats easy enough right? Of course the perl purists will say use map, etc
but that would hide a bit to much about the logic of what is going on from a
beginner of course you should know that this ca all be done a lot faster and
more efficient etc. But for a beginner it is more important that you
understand the logic of it all then that you know the fancy tricks without
understanding how they work.

Anyway I think this is what you are looking for... A DB link is always a
hard thing to get approved I know that one besides depending on the location
of the databases you might not gain all that much by having the database
deal with this for you due to network latency and such things. If you did
want to do this on the DB level look at the oracle MINUS command for the
easiest way to implement this though there are others of course ;-)

I hope this helps you a bit, regards,

Rob

On Fri, Mar 4, 2011 at 12:30 PM, mammoccio <vito.pasc...@gmail.com> wrote:

> First of all, thank for u support! Really !
>
> > The most efficient way would be to arrange to have the Oracle database
> > engine do most of the comparisons. I am not enough of a database expert
> to
> > recommend ways to do this.
> I agree with u , but it's something that is not possible without a
> dblink, that's why I'm here to write some code.... :)
> > The fastest way to do this in Perl would be to save the results of one
> query
> > in memory in a data structure of some type, either an array or a hash.
> Then,
> > as the results from the second query are fetched, compare against the
> copy
> > in memory and save what differs (you have not explained how to decide
> when
> > the two results differ).
> That's exactly what I was trying to do,and u are right probably I was
> not so clear about "how to decide  when the two results differ".
> Let's try to put some light on:
>
> The results of the first query will be an array something like this:
>
> @G1 = (["alfa" ,  "10"] ,
>                            ["beta" ,  "11"]);
>
> Similar the second query will give this kind of array:
>
> @L1 = (["alfa" ,   "10"],
>                            ["gamma" ,  "12"]);
>
> And finally the third query:
>
> @G2 = ("gamma");
>
>
> G1 and G2 are two query on the same db and same schema too.
> L1 is on another db server and of course on a different schema.
>
> So what I need is to discharge the results that I found in all the three
> queries and take only what differ.
> For something coming directly from the data nature I need to check
> before the G1 and L1 and then check what differ on G2.
> In example the results that i need is the couple "beta 11".
>
> >
> > How huge? Some representative numbers would help.
> >
> The results of the query of G1 is on the order of 100k+ records, and the
> machine that will do the perl code is a linux dell  with 3GB of memory
> and a cpu AMD Opteron 1.7GHz,..
>
> > I do not understand what you are saying. Are you saying that queries G1
> and
> > L1 are huge and G2 is not?
> No I was just trying to say that i had problem to check difference
> against the G2 query couse it just have one column of results (normal
> array) where the results coming from the other two query are a 2 column
> (multidimensional array).
>
> >> We were all newbies at one time.
>
> Hope to grow up then :)
>
> Again tnx all for u help!
>
>
> --
> Vito Pascali
> ICT Security Manager
> IT Senior System Administrator
> vito.pasc...@gmail.com
>
>

Re: multidimensional array check

Reply via email to