Re: Add Unicode Support to the DBI
On 21/09/11 21:52, Greg Sabino Mullane wrote: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 ... And maybe that's the default. But I should be able to tell it to be pedantic when the data is known to be bad (see, for example data from an SQL_ASCII-encoded PostgreSQL database). ... DBD::Pg's approach is currently broken. Greg is working on fixing it, but for compatibility reasons the fix is non-trivial (an the API might be, too). In a perfect world DBD::Pg would just always do the right thing, as the database tells it what encodings to use when you connect (and *all* data is encoded as such, not just certain data types). But the world is not perfect, there's a lot of legacy stuff. Greg, care to add any other details? My thinking on this has changed a bit. See the DBD::Pg in git head for a sample, but basically, DBD::Pg is going to: * Flip the flag on if the client_encoding is UTF-8 (and server_encoding is not SQL_ASCII) * Flip if off if not The single switch will be pg_unicode_flag, which will basiccaly override the automatic choice above, just in case you really want your SQL_ASCII byte soup marked as utf8 for some reason, or (more likely), you want your data unmarked as utf8 despite being so. This does rely on PostgreSQL doing the right thing when it comes to encoding/decoding/storing all the encodings, but I'm pretty sure it's doing well in that regard. ... Since nobody has actally defined a specific interface yet, let me throw out a straw man. It may look familiar :) === * $h-{unicode_flag} If this is set on, data returned from the database is assumed to be UTF-8, and the utf8 flag will be set. DBDs will decode the data as needed. There is more than one way to encode unicode - not everyone uses UTF-8; although some encodings don't support all of unicode. unicode is not encoded as UTF-8 in ODBC using the wide APIs. Using the wide ODBC APIs returns data in UCS2 encoding and DBD::ODBC decodes it. Using the ANSI APIs data is returned as octets and is whatever it is - it may be ASCII, it may be UTF-8 encoded (only in 2 cases I know and I believe they are flawed anyway) it may be something else in which case the application needs to know what it is. In the case of octets which are UTF-8 encoded DBD::ODBC has no idea that is the case unless you tell it and it will then set the UTF-8 flag (but see later). If this is set off, the utf8 flag will never be set, and no decoding will be done on data coming back from the database. If this is not set (undefined), the underlying DBD is responsible for doing the correct thing. In other words, the behaviour is undefined. === I don't think this will fit into DBD::Pgs current implementation perfectly, as we wouldn't want people to simply leave $h-{unicode_flag} on, as that would force SQL_ASCII text to have utf8 flipped on. Perhaps we simply never, ever allow that. I'm not that familiar with Postgres (I've used a few times and not to any great degree) and I used MySQL for a while years ago. I occasionally use SQLite. I do use DBD::Oracle and DBD::ODBC all the time. I'm still struggling to see the problem that needs fixing. Is it just that some people would like a DBI flag which tells the DBD: 1) decode any data coming back from the database strictly such that if it is invalid you die 2) decode any data coming back from the database loosely (e.g., utf-8 vs UTF-8) 3) don't decode the data from the database at all 4) don't decode the data, the DBD knows it is say UTF-8 encoded and simply sets the UTF-8 flag (which from what I read is horribly flawed but seems to work for me). and the reverse. DBD::Oracle does 1 some of the time and it does 4 the rest of the time e.g. error messages are fully decoded from UTF-8 IF Oracle is sending UTF-8 and it does 4 on most of the column data IF Oracle is sending UTF-8. DBD::ODBC does nothing via the ANSI APIs unless the odbc_utf8 flag is turned on in which case it does 4 (and it only does this because there is apparently a version of the Postgres ODBC driver out there somewhere that returns UTF-8 encoded data, I've never seen it, I just accepted the patch). DBD::ODBC does 1 if using the wide APIs and it has little choice since no one would want to accept UCS2 and have to decode it all the time. My point being, doesn't the DBD know how the data is encoded when it gets it from the database? and it would hopefully know what the database needs when sending data. Perhaps in some conditions the DBD does not know this and needs to be told (I could imagine SQLite reading/writing straight to files for instance might want to know to open the file with UTF-8 layer). So is the problem that sometimes a DBD does not know what to encode data being sent to the database or how/whether to decode data coming back from the database? and if that is the case do we need some settings in DBI to tell a DBD? Martin -- Martin J. Evans Easysoft Limited http://www.easysoft.com
Re: Add Unicode Support to the DBI
David, I forgot to answer your post first and ended up putting most of my comments in a reply to Greg's posting - sorry, it was a long night last night. Some further comments below: On 21/09/11 19:44, David E. Wheeler wrote: On Sep 10, 2011, at 3:08 AM, Martin J. Evans wrote: I'm not sure any change is required to DBI to support unicode. As far as I'm aware unicode already works with DBI if the DBDs do the right thing. Right, but the problem is that, IME, none of them do the right thing. As I said, I've submitted encoding-related bug reports for every DBD I've used in production code. And they all have different interfaces for tweaking things. So it is these different interfaces for tweaking things you'd like to standardise I presume. If you stick to the rule that all data Perl receives must be decoded and all data Perl exports must be encoded it works (ignoring any issues in Perl itself). Er, was there supposed to be a , then … statement there? I bow to Tom's experience but I'm still not sure how that applies to DBI so long as the interface between the database and Perl always encodes and decodes then the issues Tom describes are all Perl ones - no? The trouble is that: 1. They don't always encode or decode 2. When they do, the tend to get subtle bits wrong 3. And they all have different interfaces and philosophies for doing so Surely Oracle should return the data encoded as you asked for it and if it did not Oracle is broken. I'd still like to see this case and then we can see if Oracle is broken and if there is a fix for it. Oh I don't doubt that Oracle is broken. In some places DBD::Oracle does sv_utf8_decode(scalar) or SvUTF8_on(scalar) (depending on your Perl) and in some places it just does SvUTF8_on(scalar). I believe the latter is much quicker as the data is not checked. Many people (myself included) are particularly interested in DBD::Oracle being fast and if all the occurrences were changed to decode I'd patch that out in my copy as I know the data I receive is UTF-8 encoded. IME It needs an assume Oracle is broken knob. That is, I should have the option to enface encoding and decoding, rather than just flipping SvUTF8. And I think that such an interface should be standardized in the DBI along with detailed information for driver authors how how to get it right. ok, I get that. See above. I'd like the chance to go with speed and take the consequences rather than go with slower but know incorrect UTF-8 is spotted. And maybe that's the default. But I should be able to tell it to be pedantic when the data is known to be bad (see, for example data from an SQL_ASCII-encoded PostgreSQL database). I thought UTF-8 when used in Perl used the strict definition and utf-8 used Perl's looser definition - see http://search.cpan.org/~dankogai/Encode-2.44/Encode.pm#UTF-8_vs._utf8_vs._UTF8 That's right. So if I want to ensure that I'm getting strict encoding in my database, It needs to encode and decode, not simply flip SvUTF8. Don't DBDs do this now? I know the encoding of the data I receive in DBD::ODBC and decode it when I get it and encode it when I send it and I believe that is what DBD::Oracle does as well. There is one exception in ODBC for drivers which don't truly abide by ODBC spec and send 8 bit data back UTF-8 encoded (see later). There is no single API for configuring this in the DBI, and I argue there should be. I've spent a lot of effort getting unicode working in DBD::ODBC (for UNIX and with patches from Alexander Foken for Windows) which is implemented in an awkward fashion in ODBC. I'd like to hear from DBD authors what support they already have and how it is implemented so we can see what ground is already covered and where the problems were. DBD::Pg's approach is currently broken. Greg is working on fixing it, but for compatibility reasons the fix is non-trivial (an the API might be, too). In a perfect world DBD::Pg would just always do the right thing, as the database tells it what encodings to use when you connect (and *all* data is encoded as such, not just certain data types). But the world is not perfect, there's a lot of legacy stuff. Greg, care to add any other details? as I remain unconvinced a problem exists other than incorrectly coded DBDs. I'm happy to collate that information. As a start I'll describe the DBD::ODBC: 1. ODBC has 2 sets of APIs, SQLxxxA (each chr is 8 bits) and SQLxxxW (each chr is 16 bits and UCS-2). This is how Microsoft did it and yes I know that does not support all of unicode but code pages get involved too. 2. You select which API you are using with a macro when you compile your application so you cannot change your mind. You can in theory call SQLxxxA or SQLxxxW functions directly but if you use SQLxxx you get the A or W depending on what the macro is set to. Problem: DBD::ODBC has to built one way or the other. 3. When using the SQLxxxA functions you can still bind columns/parameters as wide
Database/DBD Bridging?
Okay, this is a big blue sky idea, but like all things open-source, it comes out of a need. I'm trying to merge together Excel (or CSV), Oracle, Fusion Tables, JSON, and SNMP for various data points and outputs. DBIC seems to work great for a large database with a bunch of tables, but what about a bunch of databases? I've searched and searched, and nobody seemed to have designed a DBD for multiple DBDs. There's DBD::Multi and Multiplex, but that's merely for replication. This would require reparsing of SQL statements. So, let's call this module idea DBD::IntegrateDB or MultiDB. It would be a module built from SQL::Statement (using the typical Embed instructions), so it would use that module's SQL Engine for parsing and processing SQL. We'll use a simple example of two databases: one Oracle, and one MySQL. This module loads both of them in with a DBI-connect string. Then the dev runs a prepare on the following SQL: SELECT book, title, b.person, age, dob FROM ora.books b INNER JOIN mysql.people p ON ( b.person_id = p.person_id ) So, ora.books is on the Oracle DB, and mysql.people is on the MySQL DB. The parser for this MultiDB would: 1. Use SQL::Parser to break down the SQL statement. 2. Figure out who owns what, in terms of tables and columns. (Complain about ambiguous columns if it has to.) 3. Use table_info calls to the separate DBI interfaces, including number of rows, cardinality (if available), etc. 4. Store the joining information. 5. Prepare two *separate* SQL statements for each DB. It would no longer be JOIN queries, but standard queries for the tables (including person_id, which wasn't included in the original SELECT statement). Then when the statement is executed: 1. The two SQL statements are executed for each DB. 2. The fetch_row sub would process each row one at a time for each DB. 3. If two IDs match, send a row back. Otherwise, cache the data and wait for something to match. 4. Repeat until the rows are exhausted on one or both sides. (One side for INNER, both sides for OUTER.) Does anything like that exists? I'm not saying it's an easy operation, but if something like that can just start off with a simple JOINs at first, it would be a miracle module. Imagine linking with more abstract DBI modules: Oracle to CSV to MySQL to Teradata to Sys to Sponge. Tell me you're not excited at the prospect of eventually creating free open-source competitive frameworks to multi-million dollar Business Intelligence software. Heck, DBIC is getting there, but the data mining and integration piece is missing. -- Brendan Byrd/SineSwiper sineswi...@gmail.com Computer tech, PERL wizard, and all-around Internet guru
Re: Database/DBD Bridging?
On Sep 21, 2011, at 7:53 PM, Brendan Byrd wrote: Okay, this is a big blue sky idea, but like all things open-source, it comes out of a need. I'm trying to merge together Excel (or CSV), Oracle, Fusion Tables, JSON, and SNMP for various data points and outputs. DBIC seems to work great for a large database with a bunch of tables, but what about a bunch of databases? I've searched and searched, and nobody seemed to have designed a DBD for multiple DBDs. There's DBD::Multi and Multiplex, but that's merely for replication. This would require reparsing of SQL statements. I think it'd be simpler to use something like SQL/MED. In PostgreSQL 9.1, for example, you can install foreign data wrappers so that you can create tables inside PostgreSQL that actually point to CSV, MySQL, Oracle, Redis, or any number of other sources. It's read-only right now, though that's often all people need. Some example FDWs: http://www.postgresql.org/docs/current/static/file-fdw.html http://pgxn.org/tag/fdw ISTR that SQL Server has had similar capabilities for a long time, and I know Access has. Hell, ODBC might get you a long way toward it. In short, I think that a multi-backend DBD is probably the wrong level at which to do this, waay too much work. Best, David
Re: Add Unicode Support to the DBI
On 22/09/2011 17:36, David E. Wheeler wrote: On Sep 22, 2011, at 2:26 AM, Martin J. Evans wrote: There is more than one way to encode unicode - not everyone uses UTF-8; although some encodings don't support all of unicode. Yeah, maybe should be utf8_flag instead. see below. unicode is not encoded as UTF-8 in ODBC using the wide APIs. Using the wide ODBC APIs returns data in UCS2 encoding and DBD::ODBC decodes it. Using the ANSI APIs data is returned as octets and is whatever it is - it may be ASCII, it may be UTF-8 encoded (only in 2 cases I know and I believe they are flawed anyway) it may be something else in which case the application needs to know what it is. In the case of octets which are UTF-8 encoded DBD::ODBC has no idea that is the case unless you tell it and it will then set the UTF-8 flag (but see later). Right. There needs to be a way to tell the DBI what encoding the server sends and expects to be sent. If it's not UTF-8, then the utf8_flag option is kind of useless. I think this was my point above, i.e., why utf8? databases accept and supply a number of encodings so why have a flag called utf8? are we going to have ucs2, utf16, utf32 flags as well. Surely, it makes more sense to have a flag where you can set the encoding in the same form Encode uses. I'm not that familiar with Postgres (I've used a few times and not to any great degree) and I used MySQL for a while years ago. I occasionally use SQLite. I do use DBD::Oracle and DBD::ODBC all the time. I'm still struggling to see the problem that needs fixing. Is it just that some people would like a DBI flag which tells the DBD: 1) decode any data coming back from the database strictly such that if it is invalid you die 2) decode any data coming back from the database loosely (e.g., utf-8 vs UTF-8) 3) don't decode the data from the database at all 4) don't decode the data, the DBD knows it is say UTF-8 encoded and simply sets the UTF-8 flag (which from what I read is horribly flawed but seems to work for me). and the reverse. Yes, with one API for all drivers, if possible, and guidelines for how it should work (when to encode and decode, what to encode and decode, when to just flip the utf8 flag on and off, etc.). ok DBD::Oracle does 1 some of the time and it does 4 the rest of the time e.g. error messages are fully decoded from UTF-8 IF Oracle is sending UTF-8 and it does 4 on most of the column data IF Oracle is sending UTF-8. Yeah, but to enable it *you set a bloody environment variable*. WHAT? Unless I'm mistaken as to what you refer to I believe that is a feature of the Oracle client libraries and not one of DBD::Oracle so there is little we can do about that. My point being, doesn't the DBD know how the data is encoded when it gets it from the database? and it would hopefully know what the database needs when sending data. Perhaps in some conditions the DBD does not know this and needs to be told (I could imagine SQLite reading/writing straight to files for instance might want to know to open the file with UTF-8 layer). Or to turn it off, so you can just pass the encoded UTF-8 through to the file without the decode/encode round-trip. So is the problem that sometimes a DBD does not know what to encode data being sent to the database or how/whether to decode data coming back from the database? and if that is the case do we need some settings in DBI to tell a DBD? That's an issue, yes, but the main issue is that all the drivers do it differently, sometimes with different semantics, and lack all the functionality one might want (e.g., your examples 1-4). Best, David So to try and move forward, we'd we talking about a flag or flags which say: 1 encode the data sent to the database like this (which could be nothing) 2 decode the data retrieved from the database like this (which could be nothing but if not nothing it could be using strict or loose for the UTF-8 and utf-8 case) 3 don't decode but use SvUTF8_on (a specific case since Perl uses that internally and a number of database return UTF-8) one that seems to work but I worry about. 4 do what the DBD thinks is best - whatever the behaviour is now? and what about when it conflicts with your locale/LANG? and what about PERL_UNICODE flags, do they come into this? and what about when the DBD knows you are wrong because the database says it is returning data in encoding X but you ask for Y. and for DBD::ODBC built for unicode API am I expected to try and decode UCS2 as x just because the flag tells me to and I know it will not work? Seems like it only applies to the ANSI API in DBD::ODBC where the data could be UTF-8 encoded in a few (possibly broken see http://www.martin-evans.me.uk/node/20#unicode) cases. I still think it would help to name some specific cases per DBD of flags in use and why they exist: DBD::ODBC has a odbc_utf8_on flag to say that data returned by the database when using the ANSI APIs is
Re: Add Unicode Support to the DBI
On Sep 22, 2011, at 11:14 AM, Martin J. Evans wrote: Right. There needs to be a way to tell the DBI what encoding the server sends and expects to be sent. If it's not UTF-8, then the utf8_flag option is kind of useless. I think this was my point above, i.e., why utf8? databases accept and supply a number of encodings so why have a flag called utf8? are we going to have ucs2, utf16, utf32 flags as well. Surely, it makes more sense to have a flag where you can set the encoding in the same form Encode uses. Yes, I agreed with you. :-) Unless I'm mistaken as to what you refer to I believe that is a feature of the Oracle client libraries and not one of DBD::Oracle so there is little we can do about that. Sure you can. I set something via the DBI interface and the DBD sets the environment variable for the Oracle client libraries. So to try and move forward, we'd we talking about a flag or flags which say: 1 encode the data sent to the database like this (which could be nothing) 2 decode the data retrieved from the database like this (which could be nothing but if not nothing it could be using strict or loose for the UTF-8 and utf-8 case) 3 don't decode but use SvUTF8_on (a specific case since Perl uses that internally and a number of database return UTF-8) one that seems to work but I worry about. 4 do what the DBD thinks is best - whatever the behaviour is now? Yes. and what about when it conflicts with your locale/LANG? So what? and what about PERL_UNICODE flags, do they come into this? What are those? and what about when the DBD knows you are wrong because the database says it is returning data in encoding X but you ask for Y. Throw an exception or a warning. and for DBD::ODBC built for unicode API am I expected to try and decode UCS2 as x just because the flag tells me to and I know it will not work? Seems like it only applies to the ANSI API in DBD::ODBC where the data could be UTF-8 encoded in a few (possibly broken see http://www.martin-evans.me.uk/node/20#unicode) cases. If the user does something that makes no sense, tell them it makes no sense. Die if necessary. I still think it would help to name some specific cases per DBD of flags in use and why they exist: DBD::ODBC has a odbc_utf8_on flag to say that data returned by the database when using the ANSI APIs is UTF-8 encoded and currently it calls SvUTF8_on on that data (I've never used or verified it works myself but the person supplying the patch said it had a purpose with a particular Postgres based database). That's what the new DBD::Pg flag that Greg's working on does, too. Beyond that DBD::ODBC has no other flags as it knows in the unicode/wide APIs the data is UCS2 encoded and it checks it is valid when decoding it. Similarly when sending data to the database in the wide APIs it takes the Perl scalar and encodes it in UCS2. Yeah, ideally, by default, if the DBD knows the encoding used by the database, it should just DTRT. There are backward compatibility issues with that for DBD::Pg, though. So there probably should be a knob to say don't do any encoding or decoding at all, because a lot of older apps likely expect that. DBD::Oracle to my knowledge has no special flags; it just attempts to do the right thing but it favours speed so most data that is supposed to be UTF-8 encoded has SvUTF8_on set but in one case (error messages) it properly and strictly decodes the message so long as your Perl is recent enough else it uses SvUTF8_on. So, what are the other flags in use and what purpose do they fulfill. I think we could really just start with one flag, encoding. By default the DBD should just try to do the right thing. If encoding is set to :raw then it should do no encoding or decoding. If it's set to :utf8 it should just turn the flag on or off. If it's set to an actual encoding it should encode and decode. I think that would be a good start. Best, David
Re: Add Unicode Support to the DBI
On Sep 22, 2011, at 11:57 AM, Martin J. Evans wrote: ok except what the oracle client libraries accept does not match with Encode accepted strings so someone would have to come up with some sort of mapping between the two. Yes. That's one of the consequences of providing a single interface to multiple databases. and what about when it conflicts with your locale/LANG? So what? I'm not so sure this is a So what as Perl itself uses locale settings in some cases - just thought it needed mentioning for consideration. I'm not really concerned about locales at this point. I tend to leave collation, for example, up to the database. Right now I'm strictly concerned about encoding. and what about PERL_UNICODE flags, do they come into this? What are those? See http://perldoc.perl.org/perlrun.html In particular UTF-8 is the default PerlIO layer for input streams of which reading data from a database could be considered one? That'd be cool, but it's not currently implemented that way, obviously. DBI and PerlIO are completely independent AFAIK, and the DBI doesn't look like a file handle. ok, I'm thinking through the ramifications of this. To add to the list I see DBD::SQLite has |sqlite_unicode |strings coming from the database and passed to the collation function will be properly tagged with the utf8 flag; but this only works if the |sqlite_unicode| attribute is set before the first call to a perl collation sequence and The current FTS3 implementation in SQLite is far from complete with respect to utf8 handling : in particular, variable-length characters are not treated correctly by the builtin functions |offsets()| and |snippet()|. and DBD::CSV has f_encoding = utf8, DBD::mysql has mysql_enable_utf8 which apparently This attribute determines whether DBD::mysql should assume strings stored in the database are utf8. This feature defaults to off. I could not find any special flags for DBD::DB2. DBD::Sybase has syb_enable_utf8 If this attribute is set then DBD::Sybase will convert UNIVARCHAR, UNICHAR, and UNITEXT data to Perl's internal utf-8 encoding when they are retrieved. Updating a unicode column will cause Sybase to convert any incoming data from utf-8 to its internal utf-16 encoding. Yeah, so I think that can be generalized. Best, David
Re: Database/DBD Bridging?
Brendan, Taking into account David's response ... I have several answers for you: 1. The functionality you are talking about is often referred to as a federated database, where you have one database engine that the client talks to which turns around and farms some of the work off to various other database engines, coordinating between them and doing some extra work itself. What you initially proposed is essentially using Perl as the client-most database engine, and what David proposed is using PostgreSQL as the client-most engine instead. 2. If PostgreSQL 9.1's SQL/MED abilities will do what you need, then use it. 3. If immense scalability isn't needed and you want to do this in Perl then I recommend you look at my Set::Relation CPAN module, which will handle a lot of the difficult work for you. Specifically, you would still manage for yourself the parceling out of queries to the various back-end DBMSs like Oracle or Excel or whatever, but Set::Relation will then take care of all the drudgery of taking the various rowsets returned from those and combine them into the various end-result queries you actually wanted to do. Each Set::Relation object contains a rowset and you can use its several dozen methods to do relational joins or aggregates or antijoins or various other things, all the functionality of SQL, but in Perl. Its main limiting factor is that it is entirely RAM-based, though this also makes it simpler. So you can do this right now. 4. I am presently implementing a relational DBMS in Perl which provides all the functionality you described and more, including query language support that lets you write code like you demonstrated. Strictly speaking the initial version is fully self-contained for simplicity, but a subsequent/spinoff version would add the ability to farm out to other database engines as per SQL/MED, and this *is* designed to scale. It even uses the paradigm you mention, where each underlying engine is essentially a namespace in which tables live, and you can join between them as if they were all local; or to be more accurate, each database *connection* has its own namespace and the underlying engine is just a quality of that connection, like with how DBI lets you have multiple connections with the same driver. So if this sounds like something you want to help create, please talk with me. -- Darren Duncan Brendan Byrd wrote: Okay, this is a big blue sky idea, but like all things open-source, it comes out of a need. I'm trying to merge together Excel (or CSV), Oracle, Fusion Tables, JSON, and SNMP for various data points and outputs. DBIC seems to work great for a large database with a bunch of tables, but what about a bunch of databases? I've searched and searched, and nobody seemed to have designed a DBD for multiple DBDs. There's DBD::Multi and Multiplex, but that's merely for replication. This would require reparsing of SQL statements. So, let's call this module idea DBD::IntegrateDB or MultiDB. It would be a module built from SQL::Statement (using the typical Embed instructions), so it would use that module's SQL Engine for parsing and processing SQL. We'll use a simple example of two databases: one Oracle, and one MySQL. This module loads both of them in with a DBI-connect string. Then the dev runs a prepare on the following SQL: SELECT book, title, b.person, age, dob FROM ora.books b INNER JOIN mysql.people p ON ( b.person_id = p.person_id ) So, ora.books is on the Oracle DB, and mysql.people is on the MySQL DB. The parser for this MultiDB would: 1. Use SQL::Parser to break down the SQL statement. 2. Figure out who owns what, in terms of tables and columns. (Complain about ambiguous columns if it has to.) 3. Use table_info calls to the separate DBI interfaces, including number of rows, cardinality (if available), etc. 4. Store the joining information. 5. Prepare two *separate* SQL statements for each DB. It would no longer be JOIN queries, but standard queries for the tables (including person_id, which wasn't included in the original SELECT statement). Then when the statement is executed: 1. The two SQL statements are executed for each DB. 2. The fetch_row sub would process each row one at a time for each DB. 3. If two IDs match, send a row back. Otherwise, cache the data and wait for something to match. 4. Repeat until the rows are exhausted on one or both sides. (One side for INNER, both sides for OUTER.) Does anything like that exists? I'm not saying it's an easy operation, but if something like that can just start off with a simple JOINs at first, it would be a miracle module. Imagine linking with more abstract DBI modules: Oracle to CSV to MySQL to Teradata to Sys to Sponge. Tell me you're not excited at the prospect of eventually creating free open-source competitive frameworks to multi-million dollar Business Intelligence software. Heck,