On Wed, Jun 26, 2013 at 7:26 AM, Andrew Snyder <a...@dancingjars.com> wrote:

> I want to write a query like:
>
> select clients.client.client_id, columnar.sales.total_sales, web.page_hits
> from clients, columnar, web
> where clients.client_id = columnar.client_id
> and  clients.client_id = web.client_id
>
> in a system where 'clients' is actually one or more relational databases,
> 'columnar' is one or columnar databases, and 'web' is the Apache logs on
> one or more web servers.  The dbi driver would be configured to connect to
> the correct databases and filter web hits based on 'client_id'.
>
> Has somebody written that already?
>
> Thanks,
> Andrew
>


it seems like the right thing to do here would be to do three queries,
against the three data sources, and store all the results in a hash of
arrays, then dump the results. Any solution that automates it will wind up
doing at least that anyway, and might not be optimized for the join.

Unless there really are so many client IDs that you need to process the
results as a stream or run out of memory, which is unlikely.


    while (my ($c_id, $ar) = each %resultz){
          $ar->[0] or next;   # filter out client_id not appearing in
clients database
          print join( "\t", $c_id, 0+$ar->[1], 0+$ar->[2]),"\n";
    }

Two parallel hashes containing the web and columnar results, accessed once
for each result from querying the clients table, would also work.

Reply via email to