Hi,
I read about Kamaelia investigation to be used for data modeling.
I'm working on data migration projects where we have to handle tons of
data, with performance issues as each single rows and fields for an
entire database have to be read and transformed.
i believe it's somehow the same issue one can face when dealing with
massive medical data for research subjects.
I already posted a couple of questions some months ago when I first
discovered Kamaelia (which i find to be so great, and I've converted
some java friends to python and kamaelia).
I came up with the idea to use Kamaelia to distribute data and queries
across some nodes.
Summer student worked with me and we've managed to have something
running with sqlite nodes.
We have some naive concepts :
1) data are distributed physically across nodes, without any key/range
partitioning
2) user send queries to proxies
3) proxies redirects queries to nodes
4) nodes requests missing data across the system (missing data can
arise when joining tables together, and we've defined some tags to
define parent and child tables), this is the most complex part of the
system:
i) bloom filters are computed on column sets of a join
ii) bloom are sent to all other nodes
iii) matching data are computed on each node and sent back to
requesting node
iv) bloom are stored so that bloom on a column set are never
computed twice
We do not use Json to send message across network, rather we just
cpicke python native dictionaries.
I don't know if it's the best idea I had, but it's rather simple.
This system works quite well as we deal with dead data (no writes nor
update on source data, just writes on target tables which we create on
the fly).
There's a strong overhead on first queries as bloom are computed and
data are sent and received.
But after first N queries, data are "self balancing", and there's no
more data transport so that we can scale up to full parallelism.
The big issue here is sqlite :
1) no type support
2) no transaction support (==> single user system)
3) some obscur bugs (ex: sqlite sometimes raises an Exception
named "NotAnError", which is very hard to understand, don't you
think? ;)).
As Python clearly lacks a good database API, we cannot move easily to
another database.
Dealing with Oracle, MySQL, Postgres from Python is not so easy, and
DB API is never implemented the same way(!), which makes switching
from a RDMS to another a pain.
JDBC use would be so nice.
So we tried to use Kamaleia within Jython.
We've managed to import Axon + Kamaelia modules "as is", and it
surprisingly worked ... at some level.
I can state a 99% Axon compatibility with Jython.
Unfortunately, no TCPClient nor SimpleServer would run with Jython.
There's some CSA error being raised.
Sadly, we've came to the conclusion that jdbc will not run during this
summer. Do anyone know of some quick fix about it ? ( I should post
this question on the jython page ?)
Kamaelia + Jython would clearly rock for the many practical cases
where Java (which I hate) stands as a standard (JDBC, Swing ...) but
lacks some aspects (like concurrency, message passing, and will to
have a simpler life; do you see what i'm talking about?).
I'll get my student to write some slides about our work and share it
with you.
Any comment, feedbacks welcomed !
On 10 juil, 15:15, Rasjid Wilcox <[email protected]> wrote:
> 2009/7/8 Michael Sparks <[email protected]>:
>
>
>
>
>
> > If you let me have your code.google developer id, I can let you "own" these
> > two parts of the tree:
> > /trunk/Code/Python/Kamaelia/Kamaelia/Apps/JsonRPC
> > /trunk/Code/Python/Apps/JsonRPC/
>
> > Which you can then use as you would normally, except you can build your own
> > specific distributions, whilst also simplifying the use of your code inside
> > other Kamaelia systems.
>
> > Assuming you're happy with this Rasjid, I'll merge this onto /trunk - since
> > it
> > allows you the freedom to continue developing this without being constrained
> > by the main project. I think this then enables me to get out of the way, and
> > to be able to help where needed :)
>
> > Once again, many thanks - this is a really cool contribution :)
>
> Thanks Michael for setting this up. I'll set myself up a code.google
> developer id over the weekend and get back to you with it.
>
> Just so people know, I've got some significant changes in mind, so the
> current code is more to get a feel for things than how I think it will
> look in the long term. I have another week or two of urgent
> end-of-financial year work to get done, and then I will put some more
> time into moving it forward.
>
> Cheers,
>
> Rasjid.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"kamaelia" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/kamaelia?hl=en
-~----------~----~----~----~------~----~------~--~---