I only got a copy of this message directly and not also via the list as
expected, since you addressed it to the list, but anyway ...
Brendan Byrd wrote on 2011 Sep 22 at 6:25am PST/UTC-8:
The problem with PostgreSQL's SQL/MED is that it's not Perl, and it
won't work for some of the more abstract objects available as DBD.
You may want to look into PL/Perl then, using Perl inside Postgres, to bring
together some of these things, if it will work for you.
I
would like to tie this DBD::FederatedDB into DBIC, so that it can search
and insert everything on-the-fly. Shoving everything into RAM isn't
right, either, since DBD::AnyData can already do that. The whole point
of having the databases process the rows one at a time is so that it can
handle 10 million row tables without a full wasteful dump.
Another thing to ask is whether what you're doing here is a batch process where
some performance matters are less of an issue, or whether it is more on demand
or more performance sensitive.
It looks
like Set::Relation can work out great for sucking in table_info/row_info
data, and can be used as the temp cache as fractured rows come in.
Perhaps, although Set::Relation is more about making database operations like
join etc available in Perl, so you'll want to be using such various tools to
take advantage of it. But then no one besides myself has used it yet that I
know of, and others often think of tool uses beyond the creator.
I would be highly interested in developing this with you. I'm spread
pretty thin with several other Perl modules, so I otherwise wouldn't
tackle it right now. But, if you already have something started, we can
try to finish it, and that's much better than starting from scratch alone.
Do you have a repository for this new module yet? What are you calling
it? I take it the module is building off of SQL::Statement?
<snip>
If you mean the more robust/scalable solution, then that has 2 main parts, which
is a standard query language specification, Muldis D, plus multiple
implementations. It corresponds to but is distinct from the ecosystem of there
being an ISO SQL standard and its implementations in various DBMSs.
The query language, Muldis D, is not SQL but it is relevant here because it is
designed to correspond to SQL and to be an intermediary form for
generating/parsing SQL or translating between SQL dialects, or between SQL and
other languages like Perl. (This means all SQL, including stored procedures.)
This essentially is exactly what you want to do, have a common query syntax
where behind the scenes some is turned into SQL that is pushed to back-end
DBMSs, and some of which is turned into Perl to do local processing. The great
thing is as a user you don't have to know where it executes, but just that the
implementation will pick the best way to handle particular code. I think of an
analogy like LLVM that can compile selectively to a CPU or a GPU.
Automatically, more capable DBMSs like Postgres get more work pushed to them to
do natively, and less capable things like DBD::CSV or whatever have less pushed
to them and more done in Perl.
The language spec is in github at https://github.com/muldis/Muldis-D and it is
also published on CPAN in the pure-pod distribution Muldis-D, but the CPAN copy
has fallen behind at the moment.
The implementations I haven't started yet, or I did but canceled those efforts
so to do it differently, so you can't run anything yet. But I know in my head
exactly how I intend to do it.
I intend to make a few more large updates to the Muldis D spec before starting
in earnest on the implementation, so to make that simpler and easier to do (it
is substantially complete other than some large refinements); some clues to this
direction are in the file TODO_DRAFT in github.
For timetable, if I could focus on this project I could have something usable in
a few months; however, I also have a separate paying job that I'm currently
focusing on which doesn't leave much time for the new project, though I hope to
get more time to work on it maybe in mid-late October.
If you are still interested in working on this, or you just want to follow it,
please join the (low traffic) discussion list muldis-db-us...@mm.darrenduncan.net .
FYI, this project is quite serious, not pie in the sky, and it has interest from
some significant people in the industry, such as C.J. Date (well known for "An
Introduction to Database Systems" that sold over 800K copies), and one of his
latest co-authored books in 2010 explicitly covers part of my project with a
chapter.
-- Darren Duncan