On Fri, Sep 23, 2011 at 5:01 PM, Darren Duncan <dar...@darrenduncan.net>wrote:

>
>  The problem with PostgreSQL's SQL/MED is that it's not Perl, and it won't
>> work for some of the more abstract objects available as DBD.
>>
>
> You may want to look into PL/Perl then, using Perl inside Postgres, to
> bring together some of these things, if it will work for you.


Yeah, but that's mostly a hack.  It's not really portable and frankly, I
personally don't need this module.  Granted, my conversations started out
that way, but I acknowledge that the module doesn't exist and I can do
things just fine merging them together via Perl as I have been.

However, the Perl community in general needs this.  Developers need to just
grab some modules via CPAN, describe the relational models (if they don't
already exist), and have the flow of data *just work!*  We need a new way of
thinking how this data interacts.  It is ALL related, else you wouldn't be
using it.  Why do I have to parse through some JSON object and dump them
into variables just to put them into some SQL database if a) the JSON object
is already a bunch of variables, and b) they are both universally well-known
standards.  Me gluing the two together for my one program for this one
instance is basically just hard coding because no other method exists.
JSON, Oracle, CSV, Google, SNMP, Excel, Access, MySQL, XML, HTML, CGI (yes,
I went there)... Make it all available on one interface!

DBIC needs this.  The examples they provide with aren't real world.  "Hey,
just grab this SQLite database, create your own tables anyway you like, and
map those into DBIC.  Because Yo Dawg, I heard you like relationships, so we
put relationships in your relationships, so that you can waste time while
you waste time."  (Yes, I went there, too...)  After all, if you are
designing the database, you can make it as simple or as complex as you want
it, and if it's complex, just make views or materialized views to simplify
things.  Granted, I'm belittling DBIC ability to relate things in a single
database (and 4NF in general), but *real problems* with data come from
multiple sources.  Sources that you can't admin.  Sources with some really
funky languages.


>
>   I would like to tie this DBD::FederatedDB into DBIC, so that it can
>> search and insert everything on-the-fly.  Shoving everything into RAM isn't
>> right, either, since DBD::AnyData can already do that.  The whole point of
>> having the databases process the rows one at a time is so that it can handle
>> 10 million row tables without a full wasteful dump.
>>
>
> Another thing to ask is whether what you're doing here is a batch process
> where some performance matters are less of an issue, or whether it is more
> on demand or more performance sensitive.
>

Both?  I can have both, can't I?

But, seriously, the primary goal is the ability to support mostly
everything.  (Really, the goal is to get something into DBD that can go into
DBIC as a single data source.)  It would start out as just something that
would join two tables together and expand from there.

However, speed would be a good secondary goal.  I think that sort of thing
is achievable not from XS code, but from intelligent decisions about how to
pull the data.  Weighing the cardinality of the keys in the table with how
it links together.  If one table is 20 rows and another is 500K rows, then
just pull the entire 20 row table into memory and dump it into a temporary
table on the other system, so that you can do an actual JOIN over there.
Those kind of decisions, which is much of the logic of a RDBMS itself.
Also, the module has a natural speed advantage because of parallelization.
Most of these different systems are going to be on different servers, so at
least the data flow is going to be faster than one server pulling both
tables.  (In general.  Obviously, RDBMS have their own internal operations
to make things as fast as possible.)


>   It looks
>
>> like Set::Relation can work out great for sucking in table_info/row_info
>> data, and can be used as the temp cache as fractured rows come in.
>>
>
> Perhaps, although Set::Relation is more about making database operations
> like join etc available in Perl, so you'll want to be using such various
> tools to take advantage of it.  But then no one besides myself has used it
> yet that I know of, and others often think of tool uses beyond the creator.
>

Well, that's pretty much what I would be using it for: making database
operations available by defining the relational model in Set::Relation, and
have it handle exactly how the rows are going to interact.


> If you mean the more robust/scalable solution, then that has 2 main parts,
> which is a standard query language specification, Muldis D, plus multiple
> implementations.  It corresponds to but is distinct from the ecosystem of
> there being an ISO SQL standard and its implementations in various DBMSs.
>
> The query language, Muldis D, is not SQL but it is relevant here because it
> is designed to correspond to SQL and to be an intermediary form for
> generating/parsing SQL or translating between SQL dialects, or between SQL
> and other languages like Perl.  (This means all SQL, including stored
> procedures.)
>
> This essentially is exactly what you want to do, have a common query syntax
> where behind the scenes some is turned into SQL that is pushed to back-end
> DBMSs, and some of which is turned into Perl to do local processing.  The
> great thing is as a user you don't have to know where it executes, but just
> that the implementation will pick the best way to handle particular code.  I
> think of an analogy like LLVM that can compile selectively to a CPU or a
> GPU. Automatically, more capable DBMSs like Postgres get more work pushed to
> them to do natively, and less capable things like DBD::CSV or whatever have
> less pushed to them and more done in Perl.
>

Yeah, that sounds right.  So would this eventually become its own DBD
module?  Does it use DBI methods to figure out the specs of the system?  For
example, you were saying "less capable things like DBD::CSV".  Is that
determined by querying get_info for the ODBC/ANSI capability data?

The language spec is in github at https://github.com/muldis/Muldis-D and it
> is also published on CPAN in the pure-pod distribution Muldis-D, but the
> CPAN copy has fallen behind at the moment.
>

Yeah, I'll check it out in a bit.


> For timetable, if I could focus on this project I could have something
> usable in a few months; however, I also have a separate paying job that I'm
> currently focusing on which doesn't leave much time for the new project,
> though I hope to get more time to work on it maybe in mid-late October.
>

Ditto.  Some of this is work-related, but a lot of this module development
goes beyond what I have deadlines for and into making sure that it can be
done right the next time something like this comes around.


> If you are still interested in working on this, or you just want to follow
> it, please join the (low traffic) discussion list muldis-db-users@mm.**
> darrenduncan.net <muldis-db-us...@mm.darrenduncan.net> .
>
> FYI, this project is quite serious, not pie in the sky, and it has interest
> from some significant people in the industry, such as C.J. Date (well known
> for "An Introduction to Database Systems" that sold over 800K copies), and
> one of his latest co-authored books in 2010 explicitly covers part of my
> project with a chapter.
>

Of course.  Something like this is huge, but it's also hugely important to
make sure it gets into the hands of the Perl community.

-- 
Brendan Byrd/SineSwiper <sineswi...@gmail.com>
Computer tech, PERL wizard, and all-around Internet guru

Reply via email to