cross database queries?
I want to write a query like: select clients.client.client_id, columnar.sales.total_sales, web.page_hits from clients, columnar, web where clients.client_id = columnar.client_id and clients.client_id = web.client_id in a system where 'clients' is actually one or more relational databases, 'columnar' is one or columnar databases, and 'web' is the Apache logs on one or more web servers. The dbi driver would be configured to connect to the correct databases and filter web hits based on 'client_id'. Has somebody written that already? Thanks, Andrew
Re: cross database queries?
On 26.06.13 14:26, Andrew Snyder wrote: I want to write a query like: select clients.client.client_id, columnar.sales.total_sales, web.page_hits from clients, columnar, web where clients.client_id = columnar.client_id and clients.client_id = web.client_id in a system where 'clients' is actually one or more relational databases, 'columnar' is one or columnar databases, and 'web' is the Apache logs on one or more web servers. The dbi driver would be configured to connect to the correct databases and filter web hits based on 'client_id'. Has somebody written that already? As far as I understood: yes. The professional solution is available from http://www.easysoft.com/index.html When you're willing to write some pieces of code, SQL::Statement and DBI::DBD::SqlEngine will help. But in that case - all data is fetched into a Perl based SQL engine and I can confirm that it's highly unoptimized regarding mass data ... Cheers -- Jens Rehsack
Re: cross database queries?
On Wed, Jun 26, 2013 at 7:26 AM, Andrew Snyder a...@dancingjars.com wrote: I want to write a query like: select clients.client.client_id, columnar.sales.total_sales, web.page_hits from clients, columnar, web where clients.client_id = columnar.client_id and clients.client_id = web.client_id in a system where 'clients' is actually one or more relational databases, 'columnar' is one or columnar databases, and 'web' is the Apache logs on one or more web servers. The dbi driver would be configured to connect to the correct databases and filter web hits based on 'client_id'. Has somebody written that already? Thanks, Andrew it seems like the right thing to do here would be to do three queries, against the three data sources, and store all the results in a hash of arrays, then dump the results. Any solution that automates it will wind up doing at least that anyway, and might not be optimized for the join. Unless there really are so many client IDs that you need to process the results as a stream or run out of memory, which is unlikely. while (my ($c_id, $ar) = each %resultz){ $ar-[0] or next; # filter out client_id not appearing in clients database print join( \t, $c_id, 0+$ar-[1], 0+$ar-[2]),\n; } Two parallel hashes containing the web and columnar results, accessed once for each result from querying the clients table, would also work.
Re: cross database queries?
On 26.06.13 15:25, David Nicol wrote: On Wed, Jun 26, 2013 at 7:26 AM, Andrew Snyder a...@dancingjars.com mailto:a...@dancingjars.com wrote: I want to write a query like: select clients.client.client_id, columnar.sales.total_sales, web.page_hits from clients, columnar, web where clients.client_id = columnar.client_id and clients.client_id = web.client_id in a system where 'clients' is actually one or more relational databases, 'columnar' is one or columnar databases, and 'web' is the Apache logs on one or more web servers. The dbi driver would be configured to connect to the correct databases and filter web hits based on 'client_id'. Has somebody written that already? Thanks, Andrew it seems like the right thing to do here would be to do three queries, against the three data sources, and store all the results in a hash of arrays, then dump the results. Any solution that automates it will wind up doing at least that anyway, and might not be optimized for the join. S::S is not bad in joining tables - it's bad in optimizing queries. I cannot fetch the smalles table first or first query those tables which have constants in where clauses. So bad queries might cause gigabytes on memory are wasted for resulting 50 lines. That is what I meant by not optimized :) Unless there really are so many client IDs that you need to process the results as a stream or run out of memory, which is unlikely. while (my ($c_id, $ar) = each %resultz){ $ar-[0] or next; # filter out client_id not appearing in clients database print join( \t, $c_id, 0+$ar-[1], 0+$ar-[2]),\n; } Well - even if not optimized, the implementation of SQL::Statement is even better. And the datasources for S::S are easy to write - it finally requires an open_table and fetch_row method. Two parallel hashes containing the web and columnar results, accessed once for each result from querying the clients table, would also work. Yes, that's something S::S can't do out of the box. But it could do, I have a version of DBD::Sys where during open_table() the query is send against the data-sources and first fetch_row() synchronized the results of the appropriate queries. Cheers -- Jens Rehsack
RE: cross database queries?
Date: Wed, 26 Jun 2013 08:26:36 -0400 From: a...@dancingjars.com To: dbi-users@perl.org Subject: cross database queries? I want to write a query like: select clients.client.client_id, columnar.sales.total_sales, web.page_hits from clients, columnar, web where clients.client_id = columnar.client_id and clients.client_id = web.client_id in a system where 'clients' is actually one or more relational databases, 'columnar' is one or columnar databases, and 'web' is the Apache logs on one or more web servers. The dbi driver would be configured to connect to the correct databases and filter web hits based on 'client_id'. Has somebody written that already? Yes but it depends largely on what DB you are hitting against. Really has nothing to do with DBI. The above query would work perfectly fine in oracle as long as the connection user has permission on all the schema. Now if you are asking if the above will work across one or more DBDs and different Databases (oracle, files MySQL) then the answer is no. There are drivers out there for example that let Oracle hit LDAP tables and do joins but that is part of the Oracle DB not DBI and DBD. DBI does not do the join it is the DB that does that. DBI only issues the command Cheers John Thanks, Andrew
Re: cross database queries?
Thank you all for your replies. I am indeed talking about working with multiple back ends (oracle, mysql, or any valid DBI module) in the same query. I'll post any follow-up questions to the dev list. Andrew On 06/26/2013 08:26 AM, Andrew Snyder wrote: I want to write a query like: select clients.client.client_id, columnar.sales.total_sales, web.page_hits from clients, columnar, web where clients.client_id = columnar.client_id and clients.client_id = web.client_id in a system where 'clients' is actually one or more relational databases, 'columnar' is one or columnar databases, and 'web' is the Apache logs on one or more web servers. The dbi driver would be configured to connect to the correct databases and filter web hits based on 'client_id'. Has somebody written that already? Thanks, Andrew
Re: cross database queries?
On 6/26/2013 12:29 PM, Andrew Snyder wrote: Thank you all for your replies. I am indeed talking about working with multiple back ends (oracle, mysql, or any valid DBI module) in the same query. thats never going to happen. DBI is not a sql processor, its just an interface, the SQL is passed directly to the database server you're connected to some databases have support for 'foreign data wrappers', where you can explicitly setup a connection to another database then use tables on that foreign database, so if you connected to a DB like this and got the FDW stuff all working, you could maybe achieve what you want, but its almost never optimal as the planner for the 'primary' database has no way to optimize JOIN operations etc involving other databases. -- john r pierce 37N 122W somewhere on the middle of the left coast
Re: cross database queries?
On 2013.06.26 12:47 PM, John R Pierce wrote: On 6/26/2013 12:29 PM, Andrew Snyder wrote: Thank you all for your replies. I am indeed talking about working with multiple back ends (oracle, mysql, or any valid DBI module) in the same query. thats never going to happen. DBI is not a sql processor, its just an interface, the SQL is passed directly to the database server you're connected to Never say never! I'm working on a Perl-native engine right now, and it will not only be functionally complete internally but also has the option to federate to multiple back-ends, so then you can do what you want. some databases have support for 'foreign data wrappers', where you can explicitly setup a connection to another database then use tables on that foreign database, so if you connected to a DB like this and got the FDW stuff all working, you could maybe achieve what you want, but its almost never optimal as the planner for the 'primary' database has no way to optimize JOIN operations etc involving other databases. In the short term that might be a good bet though, try using an existing SQL database's FDW capabilities. -- Darren Duncan
Re: most common kinds of cross-database queries
On Wed, 2007-08-01 at 18:29 -0700, Darren Duncan wrote: What do you think would be the most common scenarios of a cross-database query? Or why would data be in multiple databases, and what kinds of ways are most likely for it to be brought together in a common query? In our case, we are migrating from a legacy SQL Server database to MySQL. We're doing it in bits and pieces. MS Access is making this quite easy for us ( because of it's cross-database query support ). I think our situation wouldn't be too uncommon, particularly amongst people who use open-source software ( ie I assume there's a general trend to migrate from the big commercial DB servers to open-source solutions ). -- Daniel Kasak IT Developer NUS Consulting Group Level 5, 77 Pacific Highway North Sydney, NSW, Australia 2060 T: (+61) 2 9922-7676 / F: (+61) 2 9922 7989 email: [EMAIL PROTECTED] website: http://www.nusconsulting.com.au
Re: most common kinds of cross-database queries
On 2007-08-01 18:29:23 -0700, Darren Duncan wrote: As a tangent to that other cross-database thread, I'm hoping for some input that can help me determine priorities in my own implementation under Muldis DB. What do you think would be the most common scenarios of a cross-database query? Or why would data be in multiple databases, and what kinds of ways are most likely for it to be brought together in a common query? For example, is it more common for the multiple databases with the same schema but different data, such as to implement partitioning or clusters, or are they all full replication for redundancy and performance, or do they tend to be all different from each other and just associate at their edges? I think a very common problem is that you have multiple databases in an organisation, which were created by different people at different times which the RDBMS they were familiar with. And then you need to combine some data from accounting (commercial software using Oracle), assets management (developed in-house using MS-Access), the intranet CMS (open source, uses MySQL) and a list the secretary maintains in an Excel spreadsheet. Different schemas, different data, different RDBMS, but you want to do a join over all of them. hp -- _ | Peter J. Holzer| If I wanted to be academically correct, |_|_) | Sysadmin WSR | I'd be programming in Java. | | | [EMAIL PROTECTED] | I don't, and I'm not. __/ | http://www.hjp.at/ | -- Jesse Erlbaum on dbi-users pgpEWI1JIBdBg.pgp Description: PGP signature
most common kinds of cross-database queries
At 9:03 AM +1000 8/2/07, Daniel Kasak wrote: Nice suggestion ( we do this from Excel sometimes ), but I'm actually planning on *competing* with Access, and not using it to do the heavy lifting for me. ie this is for a generic query engine, and not for a specific task I have in mind. As a tangent to that other cross-database thread, I'm hoping for some input that can help me determine priorities in my own implementation under Muldis DB. What do you think would be the most common scenarios of a cross-database query? Or why would data be in multiple databases, and what kinds of ways are most likely for it to be brought together in a common query? For example, is it more common for the multiple databases with the same schema but different data, such as to implement partitioning or clusters, or are they all full replication for redundancy and performance, or do they tend to be all different from each other and just associate at their edges? For example, what relational operations tend to be the most common between databases, eg: unions, joins to display data from both, using one to filter data from the other. In common scenarios, is usually the same DBMS product normally used for all the databases, or are they more likely to be different products driving each one? For those replying, please just answer the question in a generic sense for now, without regard for particulars of my project. Thank you. -- Darren Duncan
Re: most common kinds of cross-database queries
On 02/08/07, Darren Duncan [EMAIL PROTECTED] wrote: What do you think would be the most common scenarios of a cross-database query? Speaking only theoretically for now but one use case I can vaguely imagine would be something we are working on right now: We have a large SQL Server database with many individual records and we are now building a PostgresQL 8.1 database on another machine to contain statistics about these records (e.g. for each hour in the life of the database, how many records of each type were created or updated). One is derived from the other and contains aggregates of records but maybe when we'll get the web interface to dig the stats database we might want to be able to say list all records in the main db which were counted by a particular record in the stats DB. HTH, --Amos