Re: Database/DBD Bridging?
Op di 13 december 2011 om 15:35:14 (+0100) schreef rehs...@googlemail.com (Jens Rehsack): 2011/10/4 Reinier Post r...@win.tue.nl: [...] For instance, I have a simple script that needs to count the number of distinct values in columns. This seems like a pretty elementary thing to do, but after a lot of tweaking and tinkering I've found that - SELECT COUNT(DISTINCT) ... is invalid in Jet (Access MDB) - SELECT COUNT(*) FROM (SELECT DISTINCT ...)) is invalid elsewhere - SELECT DISTINCT ... is accepted but just does SELECT in DBD::CSV How old are you're DBD::CSV / SQL::Statement versions? Of course, SELECT DISTINCT works well in DBD::CSV. Indeed, SELECT DISTINCT appears to work just fine now, and I can't recall the conditions under which it appeared to fail, so feel free to attribute it to faulty testing on my part. /Jens -- Reinier
Re: Database/DBD Bridging?
2011/10/4 Reinier Post r...@win.tue.nl: On Fri, Sep 23, 2011 at 11:19:32PM -0400, Brendan Byrd wrote: [...], the Perl community in general needs this. Developers need to just grab some modules via CPAN, describe the relational models (if they don't already exist), and have the flow of data *just work!* [...] I would love to have some idea of how to write multi-DBMS SQL applications with DBI (or with anything else). The main problem I see is the differences in SQL between DB backends. They are large, and SQL::Statement just adds another dialect. For a start it would be nice to have a matrix up somewhere listing DBMSes against SQL constructs, so I can look up, for instance, against which types of DBD backends which SQL syntax will actually work (i.e. not only be accepted but also take effect). You can't do much that will work everywhere. For instance, I have a simple script that needs to count the number of distinct values in columns. This seems like a pretty elementary thing to do, but after a lot of tweaking and tinkering I've found that - SELECT COUNT(DISTINCT) ... is invalid in Jet (Access MDB) - SELECT COUNT(*) FROM (SELECT DISTINCT ...)) is invalid elsewhere - SELECT DISTINCT ... is accepted but just does SELECT in DBD::CSV How old are you're DBD::CSV / SQL::Statement versions? Of course, SELECT DISTINCT works well in DBD::CSV. /Jens The safe thing is to just do a SELECT and deduplicate in Perl, but it would be nice to do more within the backends. Once this is in place it would be nice to start to 'plug the holes' by defining a common SQL and translations into backend-specific SQL. This can be used to define a DBD::Any that can be used to talk to many different backends. (To some extent this can be approximated by a SQL processor that supports external table data, such as Access or Pg.) Of course there are many other differences to resolve, e.g. in treatment of character sets. These issues can be dealt with in a similar way. The nice thing about such a matrix is that it will already be useful on its own, even when it's unfinished. -- Reinier Post TU Eindhoven
Re: Database/DBD Bridging?
On Fri, Sep 23, 2011 at 11:19:32PM -0400, Brendan Byrd wrote: [...], the Perl community in general needs this. Developers need to just grab some modules via CPAN, describe the relational models (if they don't already exist), and have the flow of data *just work!* [...] I would love to have some idea of how to write multi-DBMS SQL applications with DBI (or with anything else). The main problem I see is the differences in SQL between DB backends. They are large, and SQL::Statement just adds another dialect. For a start it would be nice to have a matrix up somewhere listing DBMSes against SQL constructs, so I can look up, for instance, against which types of DBD backends which SQL syntax will actually work (i.e. not only be accepted but also take effect). You can't do much that will work everywhere. For instance, I have a simple script that needs to count the number of distinct values in columns. This seems like a pretty elementary thing to do, but after a lot of tweaking and tinkering I've found that - SELECT COUNT(DISTINCT) ... is invalid in Jet (Access MDB) - SELECT COUNT(*) FROM (SELECT DISTINCT ...)) is invalid elsewhere - SELECT DISTINCT ... is accepted but just does SELECT in DBD::CSV The safe thing is to just do a SELECT and deduplicate in Perl, but it would be nice to do more within the backends. Once this is in place it would be nice to start to 'plug the holes' by defining a common SQL and translations into backend-specific SQL. This can be used to define a DBD::Any that can be used to talk to many different backends. (To some extent this can be approximated by a SQL processor that supports external table data, such as Access or Pg.) Of course there are many other differences to resolve, e.g. in treatment of character sets. These issues can be dealt with in a similar way. The nice thing about such a matrix is that it will already be useful on its own, even when it's unfinished. -- Reinier Post TU Eindhoven
Re: Database/DBD Bridging?
On Tue, Oct 04, 2011 at 02:20:10PM +0200, Reinier Post wrote: On Fri, Sep 23, 2011 at 11:19:32PM -0400, Brendan Byrd wrote: For a start it would be nice to have a matrix up somewhere listing DBMSes against SQL constructs, so I can look up, for instance, against which types of DBD backends which SQL syntax will actually work A quick google turned up http://www.grantondata.com/community/dbrosettastone.html There may well be others/better/newer. For instance, I have a simple script that needs to count the number of distinct values in columns. This seems like a pretty elementary thing to do, but after a lot of tweaking and tinkering I've found that - SELECT COUNT(DISTINCT) ... is invalid in Jet (Access MDB) - SELECT COUNT(*) FROM (SELECT DISTINCT ...)) is invalid elsewhere - SELECT DISTINCT ... is accepted but just does SELECT in DBD::CSV A good ORM, like DBIx::Class, will handle most differences in SQL dialects for you, and give you a way to include fragments of literal SQL if you need to. Tim.
Re: Database/DBD Bridging?
On Sat, Sep 24, 2011 at 12:30 AM, Darren Duncan dar...@darrenduncan.netwrote: Brendan Byrd wrote: Yeah, that sounds right. So would this eventually become its own DBD module? Yes and no. It would not natively be a DBD module, but a separate module can exist that is a DBD module which wraps it. Kind of like how you have both SQLite and DBD::SQLite, say. So, it's more or less its own meta-database that talks to other database systems. Wouldn't this (or at least parts of it) essentially be a next-generation SQL::Statement engine? After all, they have a pretty good framework, but it doesn't support everything. Modules like DBD::CSV or DBD::File would benefit from having a fully 2003 compatible SQL. I'm still having trouble understanding the process, though. I mean, I looked through the three module categories. There actually isn't much code there. But, you have a* **LOT* of documentation. I mean, you have two separate module sections just for documentation, totaling 32,876 words! The Muldis::D::Basics POD is 3,742 words alone. Hell, *Old Man and the Sea* is 26,560 words. You've literally already written a book of documentation. I dare say that if you spend that time on the actual code, it would be done by now. How would this work? Let's say I have the two databases in the primary email I sent. How would that work in Perl? Do I set up Muldis first with the two databases, and then log in via DBI to the DBD, which DBIC would interface against? Does the DBD basically talk to Muldis to do the dirty work, or is there some work that the DBD would need to take care of (like actually talking to the databases)? Does it use DBI methods to figure out the specs of the system? For example, you were saying less capable things like DBD::CSV. Is that determined by querying get_info for the ODBC/ANSI capability data? It would use whatever means make sense, which might be starting with the DBI methods for some basic functionality and then doing SELECT from the INFORMATION_SCHEMA to provide enhanced functionality. Yeah, I guess DBI doesn't have the entire schema in its methods, but neither do a lot of DBMS. -- Brendan Byrd/SineSwiper sineswi...@gmail.com Computer tech, PERL wizard, and all-around Internet guru
Re: Database/DBD Bridging?
The problem with PostgreSQL's SQL/MED is that it's not Perl, and it won't work for some of the more abstract objects available as DBD. I would like to tie this DBD::FederatedDB into DBIC, so that it can search and insert everything on-the-fly. Shoving everything into RAM isn't right, either, since DBD::AnyData can already do that. The whole point of having the databases process the rows one at a time is so that it can handle 10 million row tables without a full wasteful dump. It looks like Set::Relation can work out great for sucking in table_info/row_info data, and can be used as the temp cache as fractured rows come in. I would be highly interested in developing this with you. I'm spread pretty thin with several other Perl modules, so I otherwise wouldn't tackle it right now. But, if you already have something started, we can try to finish it, and that's much better than starting from scratch alone. Do you have a repository for this new module yet? What are you calling it? I take it the module is building off of SQL::Statement? On Fri, Sep 23, 2011 at 12:27 AM, Darren Duncan dar...@darrenduncan.netwrote: Brendan, Taking into account David's response ... I have several answers for you: 1. The functionality you are talking about is often referred to as a federated database, where you have one database engine that the client talks to which turns around and farms some of the work off to various other database engines, coordinating between them and doing some extra work itself. What you initially proposed is essentially using Perl as the client-most database engine, and what David proposed is using PostgreSQL as the client-most engine instead. 2. If PostgreSQL 9.1's SQL/MED abilities will do what you need, then use it. 3. If immense scalability isn't needed and you want to do this in Perl then I recommend you look at my Set::Relation CPAN module, which will handle a lot of the difficult work for you. Specifically, you would still manage for yourself the parceling out of queries to the various back-end DBMSs like Oracle or Excel or whatever, but Set::Relation will then take care of all the drudgery of taking the various rowsets returned from those and combine them into the various end-result queries you actually wanted to do. Each Set::Relation object contains a rowset and you can use its several dozen methods to do relational joins or aggregates or antijoins or various other things, all the functionality of SQL, but in Perl. Its main limiting factor is that it is entirely RAM-based, though this also makes it simpler. So you can do this right now. 4. I am presently implementing a relational DBMS in Perl which provides all the functionality you described and more, including query language support that lets you write code like you demonstrated. Strictly speaking the initial version is fully self-contained for simplicity, but a subsequent/spinoff version would add the ability to farm out to other database engines as per SQL/MED, and this *is* designed to scale. It even uses the paradigm you mention, where each underlying engine is essentially a namespace in which tables live, and you can join between them as if they were all local; or to be more accurate, each database *connection* has its own namespace and the underlying engine is just a quality of that connection, like with how DBI lets you have multiple connections with the same driver. So if this sounds like something you want to help create, please talk with me. -- Darren Duncan Brendan Byrd wrote: Okay, this is a big blue sky idea, but like all things open-source, it comes out of a need. I'm trying to merge together Excel (or CSV), Oracle, Fusion Tables, JSON, and SNMP for various data points and outputs. DBIC seems to work great for a large database with a bunch of tables, but what about a bunch of databases? I've searched and searched, and nobody seemed to have designed a DBD for multiple DBDs. There's DBD::Multi and Multiplex, but that's merely for replication. This would require reparsing of SQL statements. So, let's call this module idea DBD::IntegrateDB or MultiDB. It would be a module built from SQL::Statement (using the typical Embed instructions), so it would use that module's SQL Engine for parsing and processing SQL. We'll use a simple example of two databases: one Oracle, and one MySQL. This module loads both of them in with a DBI-connect string. Then the dev runs a prepare on the following SQL: SELECT book, title, b.person, age, dob FROM ora.books b INNER JOIN mysql.people p ON ( b.person_id = p.person_id ) So, ora.books is on the Oracle DB, and mysql.people is on the MySQL DB. The parser for this MultiDB would: 1. Use SQL::Parser to break down the SQL statement. 2. Figure out who owns what, in terms of tables and columns. (Complain about ambiguous columns if it has to.) 3. Use table_info calls to the
Re: Database/DBD Bridging?
On Fri, Sep 23, 2011 at 5:01 PM, Darren Duncan dar...@darrenduncan.netwrote: The problem with PostgreSQL's SQL/MED is that it's not Perl, and it won't work for some of the more abstract objects available as DBD. You may want to look into PL/Perl then, using Perl inside Postgres, to bring together some of these things, if it will work for you. Yeah, but that's mostly a hack. It's not really portable and frankly, I personally don't need this module. Granted, my conversations started out that way, but I acknowledge that the module doesn't exist and I can do things just fine merging them together via Perl as I have been. However, the Perl community in general needs this. Developers need to just grab some modules via CPAN, describe the relational models (if they don't already exist), and have the flow of data *just work!* We need a new way of thinking how this data interacts. It is ALL related, else you wouldn't be using it. Why do I have to parse through some JSON object and dump them into variables just to put them into some SQL database if a) the JSON object is already a bunch of variables, and b) they are both universally well-known standards. Me gluing the two together for my one program for this one instance is basically just hard coding because no other method exists. JSON, Oracle, CSV, Google, SNMP, Excel, Access, MySQL, XML, HTML, CGI (yes, I went there)... Make it all available on one interface! DBIC needs this. The examples they provide with aren't real world. Hey, just grab this SQLite database, create your own tables anyway you like, and map those into DBIC. Because Yo Dawg, I heard you like relationships, so we put relationships in your relationships, so that you can waste time while you waste time. (Yes, I went there, too...) After all, if you are designing the database, you can make it as simple or as complex as you want it, and if it's complex, just make views or materialized views to simplify things. Granted, I'm belittling DBIC ability to relate things in a single database (and 4NF in general), but *real problems* with data come from multiple sources. Sources that you can't admin. Sources with some really funky languages. I would like to tie this DBD::FederatedDB into DBIC, so that it can search and insert everything on-the-fly. Shoving everything into RAM isn't right, either, since DBD::AnyData can already do that. The whole point of having the databases process the rows one at a time is so that it can handle 10 million row tables without a full wasteful dump. Another thing to ask is whether what you're doing here is a batch process where some performance matters are less of an issue, or whether it is more on demand or more performance sensitive. Both? I can have both, can't I? But, seriously, the primary goal is the ability to support mostly everything. (Really, the goal is to get something into DBD that can go into DBIC as a single data source.) It would start out as just something that would join two tables together and expand from there. However, speed would be a good secondary goal. I think that sort of thing is achievable not from XS code, but from intelligent decisions about how to pull the data. Weighing the cardinality of the keys in the table with how it links together. If one table is 20 rows and another is 500K rows, then just pull the entire 20 row table into memory and dump it into a temporary table on the other system, so that you can do an actual JOIN over there. Those kind of decisions, which is much of the logic of a RDBMS itself. Also, the module has a natural speed advantage because of parallelization. Most of these different systems are going to be on different servers, so at least the data flow is going to be faster than one server pulling both tables. (In general. Obviously, RDBMS have their own internal operations to make things as fast as possible.) It looks like Set::Relation can work out great for sucking in table_info/row_info data, and can be used as the temp cache as fractured rows come in. Perhaps, although Set::Relation is more about making database operations like join etc available in Perl, so you'll want to be using such various tools to take advantage of it. But then no one besides myself has used it yet that I know of, and others often think of tool uses beyond the creator. Well, that's pretty much what I would be using it for: making database operations available by defining the relational model in Set::Relation, and have it handle exactly how the rows are going to interact. If you mean the more robust/scalable solution, then that has 2 main parts, which is a standard query language specification, Muldis D, plus multiple implementations. It corresponds to but is distinct from the ecosystem of there being an ISO SQL standard and its implementations in various DBMSs. The query language, Muldis D, is not SQL but it is relevant here because it is designed to correspond to
Re: Database/DBD Bridging?
I only got a copy of this message directly and not also via the list as expected, since you addressed it to the list, but anyway ... Brendan Byrd wrote on 2011 Sep 22 at 6:25am PST/UTC-8: The problem with PostgreSQL's SQL/MED is that it's not Perl, and it won't work for some of the more abstract objects available as DBD. You may want to look into PL/Perl then, using Perl inside Postgres, to bring together some of these things, if it will work for you. I would like to tie this DBD::FederatedDB into DBIC, so that it can search and insert everything on-the-fly. Shoving everything into RAM isn't right, either, since DBD::AnyData can already do that. The whole point of having the databases process the rows one at a time is so that it can handle 10 million row tables without a full wasteful dump. Another thing to ask is whether what you're doing here is a batch process where some performance matters are less of an issue, or whether it is more on demand or more performance sensitive. It looks like Set::Relation can work out great for sucking in table_info/row_info data, and can be used as the temp cache as fractured rows come in. Perhaps, although Set::Relation is more about making database operations like join etc available in Perl, so you'll want to be using such various tools to take advantage of it. But then no one besides myself has used it yet that I know of, and others often think of tool uses beyond the creator. I would be highly interested in developing this with you. I'm spread pretty thin with several other Perl modules, so I otherwise wouldn't tackle it right now. But, if you already have something started, we can try to finish it, and that's much better than starting from scratch alone. Do you have a repository for this new module yet? What are you calling it? I take it the module is building off of SQL::Statement? snip If you mean the more robust/scalable solution, then that has 2 main parts, which is a standard query language specification, Muldis D, plus multiple implementations. It corresponds to but is distinct from the ecosystem of there being an ISO SQL standard and its implementations in various DBMSs. The query language, Muldis D, is not SQL but it is relevant here because it is designed to correspond to SQL and to be an intermediary form for generating/parsing SQL or translating between SQL dialects, or between SQL and other languages like Perl. (This means all SQL, including stored procedures.) This essentially is exactly what you want to do, have a common query syntax where behind the scenes some is turned into SQL that is pushed to back-end DBMSs, and some of which is turned into Perl to do local processing. The great thing is as a user you don't have to know where it executes, but just that the implementation will pick the best way to handle particular code. I think of an analogy like LLVM that can compile selectively to a CPU or a GPU. Automatically, more capable DBMSs like Postgres get more work pushed to them to do natively, and less capable things like DBD::CSV or whatever have less pushed to them and more done in Perl. The language spec is in github at https://github.com/muldis/Muldis-D and it is also published on CPAN in the pure-pod distribution Muldis-D, but the CPAN copy has fallen behind at the moment. The implementations I haven't started yet, or I did but canceled those efforts so to do it differently, so you can't run anything yet. But I know in my head exactly how I intend to do it. I intend to make a few more large updates to the Muldis D spec before starting in earnest on the implementation, so to make that simpler and easier to do (it is substantially complete other than some large refinements); some clues to this direction are in the file TODO_DRAFT in github. For timetable, if I could focus on this project I could have something usable in a few months; however, I also have a separate paying job that I'm currently focusing on which doesn't leave much time for the new project, though I hope to get more time to work on it maybe in mid-late October. If you are still interested in working on this, or you just want to follow it, please join the (low traffic) discussion list muldis-db-us...@mm.darrenduncan.net . FYI, this project is quite serious, not pie in the sky, and it has interest from some significant people in the industry, such as C.J. Date (well known for An Introduction to Database Systems that sold over 800K copies), and one of his latest co-authored books in 2010 explicitly covers part of my project with a chapter. -- Darren Duncan
Re: Database/DBD Bridging?
Brendan Byrd wrote: On Fri, Sep 23, 2011 at 5:01 PM, Darren Duncan dar...@darrenduncan.net wrote: This essentially is exactly what you want to do, have a common query syntax where behind the scenes some is turned into SQL that is pushed to back-end DBMSs, and some of which is turned into Perl to do local processing. The great thing is as a user you don't have to know where it executes, but just that the implementation will pick the best way to handle particular code. I think of an analogy like LLVM that can compile selectively to a CPU or a GPU. Automatically, more capable DBMSs like Postgres get more work pushed to them to do natively, and less capable things like DBD::CSV or whatever have less pushed to them and more done in Perl. Yeah, that sounds right. So would this eventually become its own DBD module? Yes and no. It would not natively be a DBD module, but a separate module can exist that is a DBD module which wraps it. Kind of like how you have both SQLite and DBD::SQLite, say. Does it use DBI methods to figure out the specs of the system? For example, you were saying less capable things like DBD::CSV. Is that determined by querying get_info for the ODBC/ANSI capability data? It would use whatever means make sense, which might be starting with the DBI methods for some basic functionality and then doing SELECT from the INFORMATION_SCHEMA to provide enhanced functionality. Of course. Something like this is huge, but it's also hugely important to make sure it gets into the hands of the Perl community. Absolutely. -- Darren Duncan
Database/DBD Bridging?
Okay, this is a big blue sky idea, but like all things open-source, it comes out of a need. I'm trying to merge together Excel (or CSV), Oracle, Fusion Tables, JSON, and SNMP for various data points and outputs. DBIC seems to work great for a large database with a bunch of tables, but what about a bunch of databases? I've searched and searched, and nobody seemed to have designed a DBD for multiple DBDs. There's DBD::Multi and Multiplex, but that's merely for replication. This would require reparsing of SQL statements. So, let's call this module idea DBD::IntegrateDB or MultiDB. It would be a module built from SQL::Statement (using the typical Embed instructions), so it would use that module's SQL Engine for parsing and processing SQL. We'll use a simple example of two databases: one Oracle, and one MySQL. This module loads both of them in with a DBI-connect string. Then the dev runs a prepare on the following SQL: SELECT book, title, b.person, age, dob FROM ora.books b INNER JOIN mysql.people p ON ( b.person_id = p.person_id ) So, ora.books is on the Oracle DB, and mysql.people is on the MySQL DB. The parser for this MultiDB would: 1. Use SQL::Parser to break down the SQL statement. 2. Figure out who owns what, in terms of tables and columns. (Complain about ambiguous columns if it has to.) 3. Use table_info calls to the separate DBI interfaces, including number of rows, cardinality (if available), etc. 4. Store the joining information. 5. Prepare two *separate* SQL statements for each DB. It would no longer be JOIN queries, but standard queries for the tables (including person_id, which wasn't included in the original SELECT statement). Then when the statement is executed: 1. The two SQL statements are executed for each DB. 2. The fetch_row sub would process each row one at a time for each DB. 3. If two IDs match, send a row back. Otherwise, cache the data and wait for something to match. 4. Repeat until the rows are exhausted on one or both sides. (One side for INNER, both sides for OUTER.) Does anything like that exists? I'm not saying it's an easy operation, but if something like that can just start off with a simple JOINs at first, it would be a miracle module. Imagine linking with more abstract DBI modules: Oracle to CSV to MySQL to Teradata to Sys to Sponge. Tell me you're not excited at the prospect of eventually creating free open-source competitive frameworks to multi-million dollar Business Intelligence software. Heck, DBIC is getting there, but the data mining and integration piece is missing. -- Brendan Byrd/SineSwiper sineswi...@gmail.com Computer tech, PERL wizard, and all-around Internet guru
Re: Database/DBD Bridging?
On Sep 21, 2011, at 7:53 PM, Brendan Byrd wrote: Okay, this is a big blue sky idea, but like all things open-source, it comes out of a need. I'm trying to merge together Excel (or CSV), Oracle, Fusion Tables, JSON, and SNMP for various data points and outputs. DBIC seems to work great for a large database with a bunch of tables, but what about a bunch of databases? I've searched and searched, and nobody seemed to have designed a DBD for multiple DBDs. There's DBD::Multi and Multiplex, but that's merely for replication. This would require reparsing of SQL statements. I think it'd be simpler to use something like SQL/MED. In PostgreSQL 9.1, for example, you can install foreign data wrappers so that you can create tables inside PostgreSQL that actually point to CSV, MySQL, Oracle, Redis, or any number of other sources. It's read-only right now, though that's often all people need. Some example FDWs: http://www.postgresql.org/docs/current/static/file-fdw.html http://pgxn.org/tag/fdw ISTR that SQL Server has had similar capabilities for a long time, and I know Access has. Hell, ODBC might get you a long way toward it. In short, I think that a multi-backend DBD is probably the wrong level at which to do this, waay too much work. Best, David
Re: Database/DBD Bridging?
Brendan, Taking into account David's response ... I have several answers for you: 1. The functionality you are talking about is often referred to as a federated database, where you have one database engine that the client talks to which turns around and farms some of the work off to various other database engines, coordinating between them and doing some extra work itself. What you initially proposed is essentially using Perl as the client-most database engine, and what David proposed is using PostgreSQL as the client-most engine instead. 2. If PostgreSQL 9.1's SQL/MED abilities will do what you need, then use it. 3. If immense scalability isn't needed and you want to do this in Perl then I recommend you look at my Set::Relation CPAN module, which will handle a lot of the difficult work for you. Specifically, you would still manage for yourself the parceling out of queries to the various back-end DBMSs like Oracle or Excel or whatever, but Set::Relation will then take care of all the drudgery of taking the various rowsets returned from those and combine them into the various end-result queries you actually wanted to do. Each Set::Relation object contains a rowset and you can use its several dozen methods to do relational joins or aggregates or antijoins or various other things, all the functionality of SQL, but in Perl. Its main limiting factor is that it is entirely RAM-based, though this also makes it simpler. So you can do this right now. 4. I am presently implementing a relational DBMS in Perl which provides all the functionality you described and more, including query language support that lets you write code like you demonstrated. Strictly speaking the initial version is fully self-contained for simplicity, but a subsequent/spinoff version would add the ability to farm out to other database engines as per SQL/MED, and this *is* designed to scale. It even uses the paradigm you mention, where each underlying engine is essentially a namespace in which tables live, and you can join between them as if they were all local; or to be more accurate, each database *connection* has its own namespace and the underlying engine is just a quality of that connection, like with how DBI lets you have multiple connections with the same driver. So if this sounds like something you want to help create, please talk with me. -- Darren Duncan Brendan Byrd wrote: Okay, this is a big blue sky idea, but like all things open-source, it comes out of a need. I'm trying to merge together Excel (or CSV), Oracle, Fusion Tables, JSON, and SNMP for various data points and outputs. DBIC seems to work great for a large database with a bunch of tables, but what about a bunch of databases? I've searched and searched, and nobody seemed to have designed a DBD for multiple DBDs. There's DBD::Multi and Multiplex, but that's merely for replication. This would require reparsing of SQL statements. So, let's call this module idea DBD::IntegrateDB or MultiDB. It would be a module built from SQL::Statement (using the typical Embed instructions), so it would use that module's SQL Engine for parsing and processing SQL. We'll use a simple example of two databases: one Oracle, and one MySQL. This module loads both of them in with a DBI-connect string. Then the dev runs a prepare on the following SQL: SELECT book, title, b.person, age, dob FROM ora.books b INNER JOIN mysql.people p ON ( b.person_id = p.person_id ) So, ora.books is on the Oracle DB, and mysql.people is on the MySQL DB. The parser for this MultiDB would: 1. Use SQL::Parser to break down the SQL statement. 2. Figure out who owns what, in terms of tables and columns. (Complain about ambiguous columns if it has to.) 3. Use table_info calls to the separate DBI interfaces, including number of rows, cardinality (if available), etc. 4. Store the joining information. 5. Prepare two *separate* SQL statements for each DB. It would no longer be JOIN queries, but standard queries for the tables (including person_id, which wasn't included in the original SELECT statement). Then when the statement is executed: 1. The two SQL statements are executed for each DB. 2. The fetch_row sub would process each row one at a time for each DB. 3. If two IDs match, send a row back. Otherwise, cache the data and wait for something to match. 4. Repeat until the rows are exhausted on one or both sides. (One side for INNER, both sides for OUTER.) Does anything like that exists? I'm not saying it's an easy operation, but if something like that can just start off with a simple JOINs at first, it would be a miracle module. Imagine linking with more abstract DBI modules: Oracle to CSV to MySQL to Teradata to Sys to Sponge. Tell me you're not excited at the prospect of eventually creating free open-source competitive frameworks to multi-million dollar Business Intelligence software. Heck,