Re: doing an inner join via cross-product

2020-07-20 Thread Bruce Gray



> On Jul 19, 2020, at 3:02 PM, Joseph Brenner  wrote:
> 
> I was thinking about the cross-product operator the other day,
> and I was wondering if there might be a convenient way of
> filtering the resulting cartesian product to do something like a
> database inner join:
> 
>my @level  = ( godzilla => 9 ,gremlin => 3, hanuman => 5 );
>my @origin = ( godzilla => 'jp',  tingler => 'us',  hanuman => 'il' );
> 
>my @results = ( @level X @origin ).grep({ $_[0].keys eq $_[1].keys });
>say @results;  # ((godzilla => 6 godzilla => jp) (hanuman => 5
> hanuman => il))

—snip--

This is a neat idea, but it will scale very poorly.
If you had 3 arrays instead of 2, and just 20 pairs in each array instead of 3,
Raku would generate 8000 pairs, which would then be filtered down to 20 at most.

This should behave better, and be almost as concise:
my %monster = (|@level, |@origin)
.classify( {.key}, :as{.value} )
.grep({ .value.elems == 2 }); # The `grep` filters to become like an inner 
join.

say .key.fmt('%7s => '), .value.raku for %monster.sort;

printf "%12s: level: %-2d origin: %3s\n", .key, |.value for %monster.sort;


> Is there some neater way of doing this that I'm missing?

Your line:
my %joined2 =| @results.map({ $_[0].keys => .map({ .values }).flat }); 
can be re-written as:
my %joined2 = @results.map: { .[0].key => .list».value };



BTW, I initially missed that “inner join” was an important factor, and locked 
in on the use of "parallel arrays" (which I spoke of as a Code Smell at TPC 
this year)
So, I wrote this:


sub parallel_arrays_to_HoH ( *@pairs ) {
my %r;
for @pairs -> ( :key($attribute_name), :value(@main_key_attr_value_pairs) ) 
{
%r{.key}{$attribute_name} = .valuefor @main_key_attr_value_pairs;
}

for %r.values <-> %subhash {
%subhash{$_} //= Any for @pairs».key;
}

return %r;
}

, which produces outer-joined HashOfHashes. It is not as succinct, but is 
encapsulated and reusable.
The HoH is easily filtered to produce a inner join.

Usage example:

my @level  = godzilla => 9 ,gremlin => 3, hanuman => 5;
my @origin = godzilla => 'jp',  tingler => 'us',  hanuman => 'il';

my %outer_joined_monster = parallel_arrays_to_HoH( (:@level), (:@origin) ); # 
Those parens are required.

say .key.fmt('%7s => '), .value.raku for %outer_joined_monster.sort;

my %inner_joined_monsters = %outer_joined_monster.grep:
*.value.values».defined.all.so;

printf "%12s: level: %-2d origin: %3s\n", .key, .value for 
%inner_joined_monsters.sort;

Produces:
godzilla => ${:level(9), :origin("jp")}
gremlin => ${:level(3), :origin(Any)}
hanuman => ${:level(5), :origin("il")}
tingler => ${:level(Any), :origin("us")}
godzilla: level: 9  origin:  jp
 hanuman: level: 5  origin:  il

— 
Hope this helps,
Bruce Gray (Util of PerlMonks)


Re: doing an inner join via cross-product

2020-07-20 Thread Aureliano Guedes
With a dataframe data structure and some adjusts and some functions
modifications it could be doing like:

data1 .# perhaps the space after . is intentional to bring the idea of
pipes replace f(x) with x . f() 
 left_join(data2, by={ x = columname_on_data1, y = columname_on_data2}

by calling the column on the join will be performed as a variable.

I found a module that implements the dataframe data structure in Perl5 but
I didn't find it to Raku.
But, I'm thinking in learn Raku better by implementing a DataFrame data
structure (as R and Pandas-Python) and some relational function or grammar
(as R tidyverse/dplyr ) to manipulate
data.

What are the variations of this grep to left, right, and outer joins?

On Sun, Jul 19, 2020 at 5:42 PM Darren Duncan 
wrote:

> This reminds me of my 2009 Set::Relation Perl module, which works to help
> you do
> SQL features like this in your application, but will soon be superseded by
> another module that also has a Raku version. -- Darren Duncan
>
> On 2020-07-19 1:02 p.m., Joseph Brenner wrote:
> > I was thinking about the cross-product operator the other day,
> > and I was wondering if there might be a convenient way of
> > filtering the resulting cartesian product to do something like a
> > database inner join:
> >
> >  my @level  = ( godzilla => 9 ,gremlin => 3, hanuman => 5 );
> >  my @origin = ( godzilla => 'jp',  tingler => 'us',  hanuman => 'il'
> );
> >
> >  my @results = ( @level X @origin ).grep({ $_[0].keys eq $_[1].keys
> });
> >  say @results;  # ((godzilla => 6 godzilla => jp) (hanuman => 5
> > hanuman => il))
> >
> > That's easy enough, though the resulting data structure isn't very neat.
> > I started looking for ways to rearrange it:
> >
> >  my %joined;
> >  for @results -> $row {
> >  say "row: ", $row;  # e.g. row: (godzilla => 9 godzilla
> => jp)
> >  say $row.map({ .keys });# e.g. ((godzilla) (godzilla))
> >  say $row.map({ .values });  # e.g. ((9) (jp))
> >
> >  my $monster =| $row[0].keys;  # e.g. godzilla
> >  my @attributes =| $row.map({ .values });  # e.g. [9 jp]
> >  %joined{ $monster } = @attributes;
> >  }
> >  say %joined;  # {godzilla => [9 jp], hanuman => [5 il]}
> >
> > I can do it more compactly, but it risks getting unreadable:
> >
> >  my %joined2 =| @results.map({ $_[0].keys => .map({ .values }).flat
> });
> >
> > In any case, the %joined structure feels more perlish, for
> > example it's easier to use it to generate reports:
> >
> >  for %joined.keys -> $key {
> >  printf "%12s: level: %-2d origin: %3s\n", $key, %joined{ $key
> }.flat;
> >  }
> >  # hanuman: level: 5  origin:  il
> >  #godzilla: level: 9  origin:  jp
> >
> > Is there some neater way of doing this that I'm missing?
> >
>


-- 
Aureliano Guedes
skype: aureliano.guedes
contato:  (11) 94292-6110
whatsapp +5511942926110


Re: doing an inner join via cross-product

2020-07-19 Thread Darren Duncan
This reminds me of my 2009 Set::Relation Perl module, which works to help you do 
SQL features like this in your application, but will soon be superseded by 
another module that also has a Raku version. -- Darren Duncan


On 2020-07-19 1:02 p.m., Joseph Brenner wrote:

I was thinking about the cross-product operator the other day,
and I was wondering if there might be a convenient way of
filtering the resulting cartesian product to do something like a
database inner join:

 my @level  = ( godzilla => 9 ,gremlin => 3, hanuman => 5 );
 my @origin = ( godzilla => 'jp',  tingler => 'us',  hanuman => 'il' );

 my @results = ( @level X @origin ).grep({ $_[0].keys eq $_[1].keys });
 say @results;  # ((godzilla => 6 godzilla => jp) (hanuman => 5
hanuman => il))

That's easy enough, though the resulting data structure isn't very neat.
I started looking for ways to rearrange it:

 my %joined;
 for @results -> $row {
 say "row: ", $row;  # e.g. row: (godzilla => 9 godzilla => jp)
 say $row.map({ .keys });# e.g. ((godzilla) (godzilla))
 say $row.map({ .values });  # e.g. ((9) (jp))

 my $monster =| $row[0].keys;  # e.g. godzilla
 my @attributes =| $row.map({ .values });  # e.g. [9 jp]
 %joined{ $monster } = @attributes;
 }
 say %joined;  # {godzilla => [9 jp], hanuman => [5 il]}

I can do it more compactly, but it risks getting unreadable:

 my %joined2 =| @results.map({ $_[0].keys => .map({ .values }).flat  });

In any case, the %joined structure feels more perlish, for
example it's easier to use it to generate reports:

 for %joined.keys -> $key {
 printf "%12s: level: %-2d origin: %3s\n", $key, %joined{ $key }.flat;
 }
 # hanuman: level: 5  origin:  il
 #godzilla: level: 9  origin:  jp

Is there some neater way of doing this that I'm missing?



doing an inner join via cross-product

2020-07-19 Thread Joseph Brenner
I was thinking about the cross-product operator the other day,
and I was wondering if there might be a convenient way of
filtering the resulting cartesian product to do something like a
database inner join:

my @level  = ( godzilla => 9 ,gremlin => 3, hanuman => 5 );
my @origin = ( godzilla => 'jp',  tingler => 'us',  hanuman => 'il' );

my @results = ( @level X @origin ).grep({ $_[0].keys eq $_[1].keys });
say @results;  # ((godzilla => 6 godzilla => jp) (hanuman => 5
hanuman => il))

That's easy enough, though the resulting data structure isn't very neat.
I started looking for ways to rearrange it:

my %joined;
for @results -> $row {
say "row: ", $row;  # e.g. row: (godzilla => 9 godzilla => jp)
say $row.map({ .keys });# e.g. ((godzilla) (godzilla))
say $row.map({ .values });  # e.g. ((9) (jp))

my $monster =| $row[0].keys;  # e.g. godzilla
my @attributes =| $row.map({ .values });  # e.g. [9 jp]
%joined{ $monster } = @attributes;
}
say %joined;  # {godzilla => [9 jp], hanuman => [5 il]}

I can do it more compactly, but it risks getting unreadable:

my %joined2 =| @results.map({ $_[0].keys => .map({ .values }).flat  });

In any case, the %joined structure feels more perlish, for
example it's easier to use it to generate reports:

for %joined.keys -> $key {
printf "%12s: level: %-2d origin: %3s\n", $key, %joined{ $key }.flat;
}
# hanuman: level: 5  origin:  il
#godzilla: level: 9  origin:  jp

Is there some neater way of doing this that I'm missing?