On Sat, 2009-08-01 at 17:33 +0200, Gerhard Killesreiter wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Pierre Rineau schrieb: > > Hello all, > > > > Working on some custom project for my company, I developed a module to > > do massive migration between sites. > > > > This module uses a full OO layer. > > > > Its internal mechanism is based on abstracting objects to migrate from a > > master site to clients. This abstraction defines how to construct object > > dependency tree and how to serialize objects. > > > > Object implementation (node, user, taxonomy, whatever) is really simple > > to use, it's only 3 methods classes (register dependencies, save, and > > update) using some kind of custom registry for developer to save/get > > back data before and after serialization. > > > > All error handling is exception oriented, and lower software layers > > won't fail on higher layers unrecoverable errors. > > > > Object fetching is based on a push/pull mechanism. Server push the > > sync order, client responds OK or not. If OK, it creates a job using > > DataSync module which allow it to run as CLI thread (which won't > > hurt the web server, and allow us a larger memory limit at run > > time). > > I am generally not happy with datasync's approach to run shell-scripts > as the webserver user. Have you considered to use Drush instead?
Drush might be something to look at, but in fact it's because of the DataSync transaction support I did choose this module. > > During the DataSync job execution, client will pull an original set > > of content, and browsing it will do incremental dependencies > > fetching (by pulling again server), based on xmlrpc (fetching > > component is also abstracted, and could be any other communication > > method than xmlrpc). > > Wouldn't the server better be qualified to decide which data the > client needs? The server decides, it gives a transaction id to client, then the client request (at pull time) data giving its transaction id, without knowing what is coming. The whole import part is handled by client browsing a list of abstract entities without knowing what's the exact implementation. > > To be unobtrusive on the system, smart unset() is done after > > building a dependencies subtree, and there is a recursion breaker in > > case of circular dependencies. > > Have you tried it with php 5.3? PHP 5.3 has too many differences with prior versions, I don't really want to support it. The fact is it's maybe already outdated because of PHP 6 devel. > > This module was created because the deploy module seems to be so > > unstable, I did not want to risk client's production sites to run > > with it. I started implementation of some sort of "deploy plan", > > using profile based on views, you construct a set of views, saved > > them in a profile, then all objects that these views reference will > > be synchronized. > > > > Right now, the module fully synchronize taxonomy and content types, > > partially synchronize users (including core account information and > > passwords), and I have a small bug left to handle with nodes (revision > > problem I think). > > > > There might be a performance or overhead problem with this > > conception with a very large amount of data, it could break > > easily. > > How large is your "very large"? If I wanted to sync 10k nodes to 100 > client sites, how successful would I be? I can't tell you that right now, I'm in active development and only test on small amount of data (something about ten nodes). I need to test and benchmark it to discover its limits, it's at an early development stage right now. > > The only way to be sure it won't break is I think to migrate stuff > > with a numerous small set of data. But the problem doing this is > > that it will be really hard to keep the transactional context of > > DataSync module. > > Yeah, one reason to let the server handle this, no? > > > There is a lot of other custom goodies coming. > > > > First thing is, what do you think about such module, should I commit > > it on drupal.org? Is there people interested? > > I am certainly interested, especially if my concerns from above can be > addressed. ;) > > > And, now that I described the module, what name should I give him, > > considering the fact I'll probably commit it on drupal.org, if > > people are interested. > > > > I though about "YAMM" (Yet Another Migration Module), or YADM (Yet > > Another Deployment Module). > > > > The fact is there is *a lot* of modules which want to do the same > > thing as this one, I just want a simple an expressive name. > > Data migration is an important and diverse task. IMO it doesn't hurt > to have several approaches. > > Cheers, > Gerhard > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkp0YGAACgkQfg6TFvELooSOzACfUr5q/9Eu5b8YETgXu6CNYLZN > JugAn1j8/8nlbVV55RmsP9ZLc9px35/A > =rk5A > -----END PGP SIGNATURE----- Pierre.
