On Fri, Mar 12, 2010 at 8:35 PM, John Patterson <[email protected]> wrote: > > On 12 Mar 2010, at 16:28, Jeff Schnitzer wrote: > > Look at these graphs: > > http://code.google.com/status/appengine/detail/datastore/2010/03/12#ae-trust-detail-datastore-get-latency > http://code.google.com/status/appengine/detail/datastore/2010/03/12#ae-trust-detail-datastore-query-latency > > Notice that a get()'s average latency is 50ms and a query()'s average > latency is 500ms. Last week the typical query was averaging > 800-1000ms with frequent spikes into 1200ms or so. > > "You are increasing my suspicion that you have never worked" with an > application that queries large amounts of data. If your queries are taking > anywhere near 1000 ms then you must be doing something seriously wrong. > One of my apps query times are generally in the 200 ms range over 2 million > records. A keys-only query can return in 50ms.
Are you debating the validity of google's statistics? Or the loud complaints posted to this mailing list last week? Some queries will certainly return faster than others, and from what I've read/watched, keys-only queries should have performance profiles roughly similar to simple gets. But there can be no doubt that real queries are quite slow compared to simple gets. But you're arguing with a straw man here. I've never suggested that queries are not useful. However, you *have* suggested that batch gets aren't important. "Batch gets are really only useful in apps that need to take a load of ids from an external source and do something with them." That's absolute rubbish. A very large (and growing) number of applications are being built on NoSQL databases that are effectively key-value stores. Cassandra, Tokyo Cabinet, HBase, Voldemort, and *dozens* of other tools are being developed because they can do something that relational systems can't: get() and put() vast quantities of data quickly. There are a growing number of applications (largely defined by staggeringly large user bases) in which the cost of maintaining traditional indexes is not practical. You aren't going to implement Twitter or Facebook with a bunch of appengine queries! But apparently Cassandra works great. > This is the time required to execute 9 parallel queries on geospatial data > and OR merge them together. Keep in mind that with Twig I could execute 90 > parallel queries and expect the time to be about the same. You have the luxury of relatively static data, which colors your view of the world. I work with data that has a high churn rate, which colors mine. I have to ask you something though - would you need to do 9 parallel queries if you were working with a datastore that has proper spatial indexes? Not that doing parallel queries isn't cool, but is it actually necessary for your app? I'm not doing spatial queries right now, but it's on the horizon. I've done the research. For my application, it's much easier and more efficient to push my spatial queries off to a cluster of PostGIS instances running elsewhere in the cloud. It's also much, much cheaper. > * Fire off a batch job at your leisure to finish it off. > > This "partial update" approach only works in cases where you are not adding > a field that you will query on. That needs to be an all-or-nothing batch > job. Nonsense, this is totally dependent on the specific logic of your application. Simple example: You're adding a loginCount to your User entity, and you want to add a query that selects out users that have logged in more than N times. No reason you can't start running those queries right away. You're trying to dismiss the utility of upgrading the dataset in-place by saying that *some* application features require the dataset to be completely transitioned before being enabled. Ok, some do some don't. Your claim is still absurd. > It probably explains why you don't think that OR queries are so important. The reason OR queries aren't high on our priority list is because nobody has been asking for them. There doesn't even seem to be an issue for it in GAE's issue tracker - or if there is, it's *pages* down the list of priorities. > They were one of the first things I tried on App Engine and one of the > reasons Twig was written. I would bet that most developers could not > imagine working with an RDBMS that did not support OR and AND queries (on > more than one property). Twigs support for these saves time and reduces the > complexity of the developers app. With Objectify they are left on their own > to re-invent the wheel every time. Our conceptual model of the datastore is not an RDBMS. It's a key-value store that also allows limited queryability. If you really want an RDBMS, I'm sure the Cloud2db guys will be happy to chime in again. Jeff -- You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
