Relating to entity groups, how can we determine what entity group each entity belongs to? Using the Data Viewer, I would think we could examine this type of setup info for each entity but I have NOT found how to do that. Thanks!
On Oct 23, 1:50 pm, "Jason (Google)" <[email protected]> wrote: > Hi Diana. As others have stated, App Engine can write to multiple entity > groups in parallel, so if each User entity is a root entity or is otherwise > placed in a different entity group, then there shouldn't be any issues. > Regarding performance, all apps should generally be able to handle up to 30 > simultaneous dynamic requests assuming a 75ms processing time for each > (average load), for a throughput of 400 qps or so: > > http://code.google.com/appengine/docs/java/runtime.html#Quotas_and_Li... > > If you want any other performance or cost-related numbers, let me know. > > For updates to the same entity or entity group, App Engine uses optimistic > concurrency as opposed to locking. If an entity is already being updated, > then the second request will fail and will automatically get retried on the > server. After consistent failures, an exception will be thrown which you can > catch to either handle gracefully. Datastore writes will fail from time to > time, generally about 0.1 to 0.2 percent of the time, but the failure rate > will be higher when there is contention, i.e. a high rate of simultaneous > writes to the same entity/entity group. > > http://code.google.com/appengine/articles/scaling/contention.html > > - Jason > > On Thu, Oct 22, 2009 at 8:04 AM, Diana Cruise <[email protected]>wrote: > > > > > > > I'm glad to hear that the 1-10 requests/second is per User root > > entity...in my case this means that huge number of Users logged in > > around the world should expect sub-second response even if tens of > > thousands clicked the Update button at the same instance in time! > > > The only problem is we do NOT hear from anyone outside of Google to > > confirm performance of large volume for specific applications and what > > the real costs are!!! > > > Regarding deadlock, I hear GAE does NOT both with lock timeouts so as > > soon as a transaction trys to retrieve a record that is already > > locked, it will receive an error and have to retry. > > > On Oct 19, 5:50 pm, "Dr. Flufenstein" <[email protected]> > > wrote: > > > Preface: Please note, I'm not speaking for google at all in this note > > > and a lot of what I've written is speculation based on what I've read > > > in various GAE docs as well as some meager knowledge of how relational > > > DBs generally work. And yes, I know datastore isn't a relational DB, > > > but I believe that their indexing implementation likely runs into many > > > of the same problems you have with indexing relational data although > > > that assumption could be completely wrong. > > > > From what I can tell, the update bottleneck you're referring to is for > > > updating what you would often think of as a single record if you were > > > persisting one instance of your User as a single denormalized record > > > in a relational schema. I suspect this bottleneck is due to the > > > datastore architecture and the way that data updates are accumulated > > > (possibly grouped/keyed by PK) in a queue, which is probably read from > > > like a cache if read requests come in before the data has been flushed > > > into the actual storage medium and replicated to the other > > > datacenters. > > > > So if each of your users were updating their own User records, I don't > > > believe you'd experience that limitation which may be an artifact of > > > how those in-memory queue/cache structures are managed/locked during > > > updates (i.e. a new update for a record may be held until it's been > > > flushed from the queue to the storage medium to prevent having to > > > merge/reconcile records in the queue). If they were all updating a > > > single shared record, then I think you'd hit this pretty quick. > > > > Let's say though that your users are updating separate records...as > > > your data size grows, you will probably see your update throughput > > > decrease as other factors become dominant, and I believe this will > > > primarily be dependent on the number and composition of the indexes on > > > your data as well as the number of entities persisted. To me, this is > > > the much riskier unknown because your average index structure is > > > harder to update piecewise in parallel because the index must allow > > > you to order/search all of the records' indexed columns. In an RDBMS > > > like SQL Server or Oracle, you'd see some level of index locking take > > > place during each transaction (maybe one page of an index) to allow > > > concurrent updates to different sections of an index before the > > > updates are committed, the transaction is ended and the locks are > > > released. > > > > In relational persistence systems, this gets slower as the indexes > > > become larger and is usually overcome with a technique like > > > partitioning which, if you aren't familiar with it, sort of gives you > > > a top level of your index tree where the data is actually spread into > > > n groups of tables/indexes depending on some value in each record, and > > > you usually pick a partition key so that data volume in each partition > > > is kind of naturally balanced because rebalancing across partitions is > > > expensive. I'm not sure that any kind of similar mechanism has been > > > exposed in the GAE datastore right now and so a single index declared > > > for an entity type is probably realized as one big index. I would > > > hope that there's sub-index granularity for locking during updates, > > > but I'm actually guessing that's not the case for a couple of reasons: > > > > 1) With most relational systems, you need to periodically rebuild the > > > index or at least refresh the index statistics. I like to simplify > > > this and think of rebuilding as rebalancing the data tree for optimal > > > access speed while refreshing statistics typically just helps query > > > optimizers decide whether use of an index should be preferred. On the > > > GAE though, they require you to have an index for each combination of > > > query parameters, so I suspect that statistics don't come into play. > > > And I haven't seen a "rebuild my indexes" function in the admin UI > > > although admittedly I haven't looked for one too hard so I wonder if > > > they aren't trying to keep the data tree somewhat well balanced during > > > each data update, which would require the entire index to potentially > > > be locked. > > > > 2) I also haven't read anything yet about deadlock situations on GAE > > > which can happen surprisingly easily if you're updating multiple > > > indexes with enough concurrency and are using page locking. If you > > > were designing the GAE datastore service, the way to avoid that > > > situation would be to lock all indexes on each data update in the same > > > order every time. You'd sacrifice a lot of throughput, but you'd > > > never hit a deadlock so I suspect they've done something like this > > > behind the scenes unless people just aren't using GAE heavily enough > > > yet or the good people of the GAE have used some special sauce in the > > > datastore service impl. > > > > So I guess what I'm trying to say is that I don't believe that you > > > should be satisfied with any particular bit of performance data from > > > another application because your mileage will almost certainly vary. > > > I think that If you really want to know how your application would > > > perform and want to find out before writing the whole app and sharing > > > it with a billion users, I would recommend a very empirical approach: > > > > I'd write a sample app with with entity group where entity widths and > > > indexes are those that you think will be representative of your > > > deployed application and then add a simple test harness that will: > > > > a) seed data to a point that you think is representative > > > b) update and query your data in what you believe will be a worst case > > > scenario and then record the times > > > > I think the resulting curve of performance you see will be highly > > > dependent on how you vary the seed data size and the number of > > > indexes. Of course there are more dimensions than that, such as the # > > > of concurrent read operations and the # of concurrent write > > > operations, that you can vary as well depending on what your > > > performance requirements are. > > > > I hope this is somewhat helpful and I also hope that it's not totally > > > incorrect and misleading since, as I said, it's all rampant > > > speculation based on somewhat limited publicly available data. > > > > -Michael > > > > P.S. Of course, if anyone has data including # of records, #/ > > > composisiton of indexes, # reads per hour, # writes per hour and > > > latency per txn, I'd be fascinated to hear about it too! > > > > On Oct 19, 4:01 pm, Diana Cruise <[email protected]> wrote: > > > > > This is exactly what I'm am talking about...in my case the User and > > > > UserAddr are both in the same Entity Group. So, are you saying that > > > > my application which has a global presence in GAE can only support 25 > > > > simultaneous Users performing this update in under 5 seconds? > > > > > Again, I take 1-10 requests per second response and go with the avg of > > > > 5/s. Add up 25 Users simultaneously hitting this Entity Group and > > > > that consumes a full 5 seconds. So, if you have 25 Users doing the > > > > same update over and over they will each have about a 5 second > > > > response. > > > > > I know I am wrong because this is way LOW for a Google platform or any > > > > other...I just am NOT hearing or seeing numbers that say otherwise. > > > > > If you clarify for me that this Entity Group performance stat of 1-10/ > > > > s is granular to the Row then we're on to something... That would > > > > tell me that my scenario above only applies if ALL Users were logged > > > > into the same account!!! If the Entity Group performance stat is > > > > granular to the Row then that would mean an infinite number of Users > > > > would average 5 updates per second. Please tell me this is TRUE! > > ... > > read more »- Hide quoted text - > > - Show quoted text - --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en -~----------~----~----~----~------~----~------~--~---
