cross database queries?

2013-06-26 Thread Andrew Snyder

I want to write a query like:

select clients.client.client_id, columnar.sales.total_sales, 
web.page_hits from clients, columnar, web

where clients.client_id = columnar.client_id
and  clients.client_id = web.client_id

in a system where 'clients' is actually one or more relational 
databases, 'columnar' is one or columnar databases, and 'web' is the 
Apache logs on one or more web servers.  The dbi driver would be 
configured to connect to the correct databases and filter web hits based 
on 'client_id'.


Has somebody written that already?

Thanks,
Andrew



Re: cross database queries?

2013-06-26 Thread Jens Rehsack

On 26.06.13 14:26, Andrew Snyder wrote:

I want to write a query like:

select clients.client.client_id, columnar.sales.total_sales,
web.page_hits from clients, columnar, web
where clients.client_id = columnar.client_id
and  clients.client_id = web.client_id

in a system where 'clients' is actually one or more relational
databases, 'columnar' is one or columnar databases, and 'web' is the
Apache logs on one or more web servers.  The dbi driver would be
configured to connect to the correct databases and filter web hits based
on 'client_id'.

Has somebody written that already?


As far as I understood: yes.

The professional solution is available from 
http://www.easysoft.com/index.html


When you're willing to write some pieces of code, SQL::Statement
and DBI::DBD::SqlEngine will help. But in that case - all data
is fetched into a Perl based SQL engine and I can confirm that
it's highly unoptimized regarding mass data ...

Cheers
--
Jens Rehsack


Re: cross database queries?

2013-06-26 Thread David Nicol
On Wed, Jun 26, 2013 at 7:26 AM, Andrew Snyder a...@dancingjars.com wrote:

 I want to write a query like:

 select clients.client.client_id, columnar.sales.total_sales, web.page_hits
 from clients, columnar, web
 where clients.client_id = columnar.client_id
 and  clients.client_id = web.client_id

 in a system where 'clients' is actually one or more relational databases,
 'columnar' is one or columnar databases, and 'web' is the Apache logs on
 one or more web servers.  The dbi driver would be configured to connect to
 the correct databases and filter web hits based on 'client_id'.

 Has somebody written that already?

 Thanks,
 Andrew



it seems like the right thing to do here would be to do three queries,
against the three data sources, and store all the results in a hash of
arrays, then dump the results. Any solution that automates it will wind up
doing at least that anyway, and might not be optimized for the join.

Unless there really are so many client IDs that you need to process the
results as a stream or run out of memory, which is unlikely.


while (my ($c_id, $ar) = each %resultz){
  $ar-[0] or next;   # filter out client_id not appearing in
clients database
  print join( \t, $c_id, 0+$ar-[1], 0+$ar-[2]),\n;
}

Two parallel hashes containing the web and columnar results, accessed once
for each result from querying the clients table, would also work.


Re: cross database queries?

2013-06-26 Thread Jens Rehsack

On 26.06.13 15:25, David Nicol wrote:

On Wed, Jun 26, 2013 at 7:26 AM, Andrew Snyder a...@dancingjars.com
mailto:a...@dancingjars.com wrote:

I want to write a query like:

select clients.client.client_id, columnar.sales.total_sales,
web.page_hits from clients, columnar, web
where clients.client_id = columnar.client_id
and  clients.client_id = web.client_id

in a system where 'clients' is actually one or more relational
databases, 'columnar' is one or columnar databases, and 'web' is the
Apache logs on one or more web servers.  The dbi driver would be
configured to connect to the correct databases and filter web hits
based on 'client_id'.

Has somebody written that already?

Thanks,
Andrew



it seems like the right thing to do here would be to do three queries,
against the three data sources, and store all the results in a hash of
arrays, then dump the results. Any solution that automates it will wind
up doing at least that anyway, and might not be optimized for the join.


S::S is not bad in joining tables - it's bad in optimizing queries.
I cannot fetch the smalles table first or first query those tables which
have constants in where clauses. So bad queries might cause gigabytes
on memory are wasted for resulting 50 lines.

That is what I meant by not optimized :)


Unless there really are so many client IDs that you need to process the
results as a stream or run out of memory, which is unlikely.


 while (my ($c_id, $ar) = each %resultz){
   $ar-[0] or next;   # filter out client_id not appearing in
clients database
   print join( \t, $c_id, 0+$ar-[1], 0+$ar-[2]),\n;
 }


Well - even if not optimized, the implementation of SQL::Statement is
even better. And the datasources for S::S are easy to write - it
finally requires an open_table and fetch_row method.


Two parallel hashes containing the web and columnar results, accessed
once for each result from querying the clients table, would also work.


Yes, that's something S::S can't do out of the box. But it could do, I
have a version of DBD::Sys where during open_table() the query is send
against the data-sources and first fetch_row() synchronized the results
of the appropriate queries.

Cheers
--
Jens Rehsack


RE: cross database queries?

2013-06-26 Thread John Scoles

 
 Date: Wed, 26 Jun 2013 08:26:36 -0400
 From: a...@dancingjars.com
 To: dbi-users@perl.org
 Subject: cross database queries?
 
 I want to write a query like:
 
 select clients.client.client_id, columnar.sales.total_sales, 
 web.page_hits from clients, columnar, web
 where clients.client_id = columnar.client_id
 and  clients.client_id = web.client_id
 
 in a system where 'clients' is actually one or more relational 
 databases, 'columnar' is one or columnar databases, and 'web' is the 
 Apache logs on one or more web servers.  The dbi driver would be 
 configured to connect to the correct databases and filter web hits based 
 on 'client_id'.
 
 Has somebody written that already?
  Yes but it depends largely on what  DB you are hitting against.  Really has 
 nothing to do with DBI. The above query would work perfectly fine in oracle 
 as long as the connection user has permission on all the schema.  Now if you 
 are asking if the above will work across one or more DBDs and different 
 Databases (oracle, files MySQL) then the answer is no. There are drivers out 
 there for example that let Oracle hit LDAP tables and do joins but that is 
 part of the Oracle DB not DBI and DBD.  DBI does not do the join it is the DB 
 that does that. DBI only issues the command Cheers John
 Thanks,
 Andrew
 
  

Re: cross database queries?

2013-06-26 Thread Andrew Snyder
Thank you all for your replies.  I am indeed talking about working with 
multiple back ends (oracle, mysql, or any valid DBI module) in the same 
query.


I'll post any follow-up questions to the dev list.

Andrew

On 06/26/2013 08:26 AM, Andrew Snyder wrote:

I want to write a query like:

select clients.client.client_id, columnar.sales.total_sales, 
web.page_hits from clients, columnar, web

where clients.client_id = columnar.client_id
and  clients.client_id = web.client_id

in a system where 'clients' is actually one or more relational 
databases, 'columnar' is one or columnar databases, and 'web' is the 
Apache logs on one or more web servers.  The dbi driver would be 
configured to connect to the correct databases and filter web hits 
based on 'client_id'.


Has somebody written that already?

Thanks,
Andrew





Re: cross database queries?

2013-06-26 Thread John R Pierce

On 6/26/2013 12:29 PM, Andrew Snyder wrote:
Thank you all for your replies.  I am indeed talking about working 
with multiple back ends (oracle, mysql, or any valid DBI module) in 
the same query. 



thats never going to happen.   DBI is not a sql processor, its just an 
interface, the SQL is passed directly to the database server you're 
connected to


some databases have support for 'foreign data wrappers', where you can 
explicitly setup a connection to another database then use tables on 
that foreign database, so if you connected to a DB like this and got the 
FDW stuff all working, you could maybe achieve what you want, but its 
almost never optimal as the planner for the 'primary' database has no 
way to optimize JOIN operations etc involving other databases.




--
john r pierce  37N 122W
somewhere on the middle of the left coast



Re: cross database queries?

2013-06-26 Thread Darren Duncan

On 2013.06.26 12:47 PM, John R Pierce wrote:

On 6/26/2013 12:29 PM, Andrew Snyder wrote:

Thank you all for your replies.  I am indeed talking about working with
multiple back ends (oracle, mysql, or any valid DBI module) in the same query.


thats never going to happen.   DBI is not a sql processor, its just an
interface, the SQL is passed directly to the database server you're connected to


Never say never!  I'm working on a Perl-native engine right now, and it will not 
only be functionally complete internally but also has the option to federate to 
multiple back-ends, so then you can do what you want.



some databases have support for 'foreign data wrappers', where you can
explicitly setup a connection to another database then use tables on that
foreign database, so if you connected to a DB like this and got the FDW stuff
all working, you could maybe achieve what you want, but its almost never optimal
as the planner for the 'primary' database has no way to optimize JOIN operations
etc involving other databases.


In the short term that might be a good bet though, try using an existing SQL 
database's FDW capabilities.


-- Darren Duncan



Re: most common kinds of cross-database queries

2007-08-02 Thread Daniel Kasak
On Wed, 2007-08-01 at 18:29 -0700, Darren Duncan wrote:

 What do you think would be the most common scenarios of a cross-database 
 query?
 
 Or why would data be in multiple databases, and what kinds of ways 
 are most likely for it to be brought together in a common query?

In our case, we are migrating from a legacy SQL Server database to
MySQL. We're doing it in bits and pieces. MS Access is making this quite
easy for us ( because of it's cross-database query support ).

I think our situation wouldn't be too uncommon, particularly amongst
people who use open-source software ( ie I assume there's a general
trend to migrate from the big commercial DB servers to open-source
solutions ).

--
Daniel Kasak
IT Developer
NUS Consulting Group
Level 5, 77 Pacific Highway
North Sydney, NSW, Australia 2060
T: (+61) 2 9922-7676 / F: (+61) 2 9922 7989
email: [EMAIL PROTECTED]
website: http://www.nusconsulting.com.au



Re: most common kinds of cross-database queries

2007-08-02 Thread Peter J. Holzer
On 2007-08-01 18:29:23 -0700, Darren Duncan wrote:
 As a tangent to that other cross-database thread, I'm hoping for some input 
 that can help me determine priorities in my own implementation under Muldis 
 DB.
 
 What do you think would be the most common scenarios of a cross-database 
 query?
 
 Or why would data be in multiple databases, and what kinds of ways are most 
 likely for it to be brought together in a common query?
 
 For example, is it more common for the multiple databases with the same 
 schema 
 but different data, such as to implement partitioning or clusters, or are 
 they 
 all full replication for redundancy and performance, or do they tend to be 
 all 
 different from each other and just associate at their edges?

I think a very common problem is that you have multiple databases in an
organisation, which were created by different people at different times
which the RDBMS they were familiar with. And then you need to combine
some data from accounting (commercial software using Oracle), assets
management (developed in-house using MS-Access), the intranet CMS (open
source, uses MySQL) and a list the secretary maintains in an Excel
spreadsheet.  Different schemas, different data, different RDBMS, but
you want to do a join over all of them.

hp

-- 
   _  | Peter J. Holzer| If I wanted to be academically correct,
|_|_) | Sysadmin WSR   | I'd be programming in Java.
| |   | [EMAIL PROTECTED]  | I don't, and I'm not.
__/   | http://www.hjp.at/ |   -- Jesse Erlbaum on dbi-users


pgpEWI1JIBdBg.pgp
Description: PGP signature


most common kinds of cross-database queries

2007-08-01 Thread Darren Duncan

At 9:03 AM +1000 8/2/07, Daniel Kasak wrote:

Nice suggestion ( we do this from Excel sometimes ), but I'm actually
planning on *competing* with Access, and not using it to do the heavy
lifting for me. ie this is for a generic query engine, and not for a
specific task I have in mind.


As a tangent to that other cross-database thread, I'm hoping for some 
input that can help me determine priorities in my own implementation 
under Muldis DB.


What do you think would be the most common scenarios of a cross-database query?

Or why would data be in multiple databases, and what kinds of ways 
are most likely for it to be brought together in a common query?


For example, is it more common for the multiple databases with the 
same schema but different data, such as to implement partitioning or 
clusters, or are they all full replication for redundancy and 
performance, or do they tend to be all different from each other and 
just associate at their edges?


For example, what relational operations tend to be the most common 
between databases, eg: unions, joins to display data from both, using 
one to filter data from the other.


In common scenarios, is usually the same DBMS product normally used 
for all the databases, or are they more likely to be different 
products driving each one?


For those replying, please just answer the question in a generic 
sense for now, without regard for particulars of my project.


Thank you. -- Darren Duncan


Re: most common kinds of cross-database queries

2007-08-01 Thread Amos Shapira
On 02/08/07, Darren Duncan [EMAIL PROTECTED] wrote:

 What do you think would be the most common scenarios of a cross-database
 query?


Speaking only theoretically for now but one use case I can vaguely imagine
would be something we are working on right now:

We have a large SQL Server database with many individual records and we are
now building a PostgresQL 8.1 database on another machine to contain
statistics about these records (e.g. for each hour in the life of the
database, how many records of each type were created or updated).  One is
derived from the other and contains aggregates of records but maybe when
we'll get the web interface to dig the stats database we might want to be
able to say list all records in the main db which were counted by a
particular record in the stats DB.

HTH,

--Amos