On Thu, Mar 11, 2010 at 8:56 PM, John Patterson <jdpatter...@gmail.com> wrote: > > But for typesafe changes large or small Twig supports data migration in a > much safer, more flexible way than Objectify. Read on for details.
You are increasing my suspicion that you've never actually performed schema migrations on big, rapidly changing datasets. > Cool, the @AlsoLoad is quite a neat feature. Although very limited to > simple naming changes and nothing structural. All this is based on a > dangerous assumption that you can modify "live" data in place. Hardly > bullet proof. Actually, @AlsoLoad (in conjunction with @LoadOnly and the @PrePersist and @PostLoad lifecycle callbacks) provides an enormous range of ability to transform your data. I know, I've had to do more of it than I would like to admit. You can rename fields, change types, arbitrarily munge data, split entities into multiple parts, combine multiple entities into one, convert between child entities and embedded parts, etc. In most cases you can do this on a live running system. That is the entire point, actually - our goal is zero downtime for schema migration. The general approach: * Modify your entities to save in your new format. * Use Objectify's primitives so that data loads in both the old format and the new format. * Test your code against your local datastore, or if you're deeply concerned, against exported data in another appid. * Deploy your new code, letting the natural churn update your database. * Fire off a batch job at your leisure to finish it off. * Remove the extra loading logic from your code when you're done. Not every migration works exactly the same way, but the tools are there. I know from experience that it works and works well. > The Twig solution is to create a new version of the type (v2) and process > your changes while leaving the live data completely isolated and safe. Then > after you have tested your changes you bump up the version number of your > live app. This is cumbersome and inelegant compared to Objectify's solution. You require the developers to 1) create a parallel hierarchy of classes and 2) create code (possibly scattered across the app) to write out both formats. You require a complete duplication of the datastore kind - potentially billions of entities occupying hell only knows how much space. It could take *weeks* to do even minor schema migrations this way. And if you want to make another minor change halfway through the process? Start from scratch! In the mean time, your customers are wondering why the new feature isn't live yet. Also... do you realize how slow and expensive deletes are in appengine? Duplicating the database is just not an option. Not with the Mobcast 2.0 dataset (not live yet, I should be able to talk about it more freely in a month or two). Certainly not with Scott's dataset, which may end up caching a significant chunk of Flickr, Picasa, and Facebook if it takes off. > What is with your obsession with batch gets? I understand they are central > in Objectify because you are always loading keys. As I said already - even > though this is not as essential in Twig it will be added to a new load > command. Batch gets are *the* core feature of NoSQL databases, including the GAE datastore. Look at these graphs: http://code.google.com/status/appengine/detail/datastore/2010/03/12#ae-trust-detail-datastore-get-latency http://code.google.com/status/appengine/detail/datastore/2010/03/12#ae-trust-detail-datastore-query-latency Notice that a get()'s average latency is 50ms and a query()'s average latency is 500ms. Last week the typical query was averaging 800-1000ms with frequent spikes into 1200ms or so. Deep down in the fiber of its being, BigTable is a key-value store. It is very very efficient at doing batch gets. It wants to do batch gets all day long. Queries require touching indexes maintained in alternative tablets and comparatively, the performance sucks. I'm by no means a BigTable expert, but I have a significant professional interest in being able to read & write a lot of data. I could not implement (perhaps better said I couldn't scale) Mobcast without batch gets and sets. To be honest, I'm not wholly thrilled with the performance of batch get/put operations on appengine either. Cassandra folks are claiming 10k/s writes *per machine*. Tokyo Tyrant folks are claiming 20k+/sec writes. Reads are even faster! True, these systems are not as full-featured as the appengine datastore... but we're talking at least two full orders of mangitude difference! Ouch. Why am I obsessed with batch gets? Because they're essential for making an application perform. They're why there is such a thing as a NoSQL movement in the first place. > Oops I didn't post the CookBook page in the end. Rest assured it is a > trivial addition and I'll update the docs. > It is also often better to cache above the data layer - hardly the killer > feature you claim. If you have a read-heavy app (and most are), nothing gives you bang-for-the-buck like adding one little annotation and pulling your data out of memcache instead of the datastore. Caching at higher levels *might* save you some additional cpu cycles, but it's certainly a lot more work. Jeff -- You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to google-appengine-j...@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.