[appengine-java] Re: GAE Performance

Diana Cruise Mon, 26 Oct 2009 09:07:00 -0700

Relating to entity groups, how can we determine what entity group each
entity belongs to?  Using the Data Viewer, I would think we could
examine this type of setup info for each entity but I have NOT found
how to do that.  Thanks!


On Oct 23, 1:50 pm, "Jason (Google)" <[email protected]> wrote:
> Hi Diana. As others have stated, App Engine can write to multiple entity
> groups in parallel, so if each User entity is a root entity or is otherwise
> placed in a different entity group, then there shouldn't be any issues.
> Regarding performance, all apps should generally be able to handle up to 30
> simultaneous dynamic requests assuming a 75ms processing time for each
> (average load), for a throughput of 400 qps or so:
>
> http://code.google.com/appengine/docs/java/runtime.html#Quotas_and_Li...
>
> If you want any other performance or cost-related numbers, let me know.
>
> For updates to the same entity or entity group, App Engine uses optimistic
> concurrency as opposed to locking. If an entity is already being updated,
> then the second request will fail and will automatically get retried on the
> server. After consistent failures, an exception will be thrown which you can
> catch to either handle gracefully. Datastore writes will fail from time to
> time, generally about 0.1 to 0.2 percent of the time, but the failure rate
> will be higher when there is contention, i.e. a high rate of simultaneous
> writes to the same entity/entity group.
>
> http://code.google.com/appengine/articles/scaling/contention.html
>
> - Jason
>
> On Thu, Oct 22, 2009 at 8:04 AM, Diana Cruise <[email protected]>wrote:
>
>
>
>
>
> > I'm glad to hear that the 1-10 requests/second is per User root
> > entity...in my case this means that huge number of Users logged in
> > around the world should expect sub-second response even if tens of
> > thousands clicked the Update button at the same instance in time!
>
> > The only problem is we do NOT hear from anyone outside of Google to
> > confirm performance of large volume for specific applications and what
> > the real costs are!!!
>
> > Regarding deadlock, I hear GAE does NOT both with lock timeouts so as
> > soon as a transaction trys to retrieve a record that is already
> > locked, it will receive an error and have to retry.
>
> > On Oct 19, 5:50 pm, "Dr. Flufenstein" <[email protected]>
> > wrote:
> > > Preface: Please note, I'm not speaking for google at all in this note
> > > and a lot of what I've written is speculation based on what I've read
> > > in various GAE docs as well as some meager knowledge of how relational
> > > DBs generally work.  And yes, I know datastore isn't a relational DB,
> > > but I believe that their indexing implementation likely runs into many
> > > of the same problems you have with indexing relational data although
> > > that assumption could be completely wrong.
>
> > > From what I can tell, the update bottleneck you're referring to is for
> > > updating what you would often think of as a single record if you were
> > > persisting one instance of your User as a single denormalized record
> > > in a relational schema.  I suspect this bottleneck is due to the
> > > datastore architecture and the way that data updates are accumulated
> > > (possibly grouped/keyed by PK) in a queue, which is probably read from
> > > like a cache if read requests come in before the data has been flushed
> > > into the actual storage medium and replicated to the other
> > > datacenters.
>
> > > So if each of your users were updating their own User records, I don't
> > > believe you'd experience that limitation which may be an artifact of
> > > how those in-memory queue/cache structures are managed/locked during
> > > updates (i.e. a new update for a record may be held until it's been
> > > flushed from the queue to the storage medium to prevent having to
> > > merge/reconcile records in the queue).  If they were all updating a
> > > single shared record, then I think you'd hit this pretty quick.
>
> > > Let's say though that your users are updating separate records...as
> > > your data size grows, you will probably see your update throughput
> > > decrease as other factors become dominant, and I believe this will
> > > primarily be dependent on the number and composition of the indexes on
> > > your data as well as the number of entities persisted.  To me, this is
> > > the much riskier unknown because your average index structure is
> > > harder to update piecewise in parallel because the index must allow
> > > you to order/search all of the records' indexed columns.  In an RDBMS
> > > like SQL Server or Oracle, you'd see some level of index locking take
> > > place during each transaction (maybe one page of an index) to allow
> > > concurrent updates to different sections of an index before the
> > > updates are committed, the transaction is ended and the locks are
> > > released.
>
> > > In relational persistence systems, this gets slower as the indexes
> > > become larger and is usually overcome with a technique like
> > > partitioning which, if you aren't familiar with it, sort of gives you
> > > a top level of your index tree where the data is actually spread into
> > > n groups of tables/indexes depending on some value in each record, and
> > > you usually pick a partition key so that data volume in each partition
> > > is kind of naturally balanced because rebalancing across partitions is
> > > expensive.  I'm not sure that any kind of similar mechanism has been
> > > exposed in the GAE datastore right now and so a single index declared
> > > for an entity type is probably realized as one big index.  I would
> > > hope that there's sub-index granularity for locking during updates,
> > > but I'm actually guessing that's not the case for a couple of reasons:
>
> > > 1) With most relational systems, you need to periodically rebuild the
> > > index or at least refresh the index statistics.  I like to simplify
> > > this and think of rebuilding as rebalancing the data tree for optimal
> > > access speed while refreshing statistics typically just helps query
> > > optimizers decide whether use of an index should be preferred.  On the
> > > GAE though, they require you to have an index for each combination of
> > > query parameters, so I suspect that statistics don't come into play.
> > > And I haven't seen a "rebuild my indexes" function in the admin UI
> > > although admittedly I haven't looked for one too hard so I wonder if
> > > they aren't trying to keep the data tree somewhat well balanced during
> > > each data update, which would require the entire index to potentially
> > > be locked.
>
> > > 2) I also haven't read anything yet about deadlock situations on GAE
> > > which can happen surprisingly easily if you're updating multiple
> > > indexes with enough concurrency and are using page locking.  If you
> > > were designing the GAE datastore service, the way to avoid that
> > > situation would be to lock all indexes on each data update in the same
> > > order every time.  You'd sacrifice a lot of throughput, but you'd
> > > never hit a deadlock so I suspect they've done something like this
> > > behind the scenes unless people just aren't using GAE heavily enough
> > > yet or the good people of the GAE have used some special sauce in the
> > > datastore service impl.
>
> > > So I guess what I'm trying to say is that I don't believe that you
> > > should be satisfied with any particular bit of performance data from
> > > another application because your mileage will almost certainly vary.
> > > I think that If you really want to know how your application would
> > > perform and want to find out before writing the whole app and sharing
> > > it with a billion users, I would recommend a very empirical approach:
>
> > > I'd write a sample app with with entity group where entity widths and
> > > indexes are those that you think will be representative of your
> > > deployed application and then add a simple test harness that will:
>
> > > a) seed data to a point that you think is representative
> > > b) update and query your data in what you believe will be a worst case
> > > scenario and then record the times
>
> > > I think the resulting curve of performance you see will be highly
> > > dependent on how you vary the seed data size and the number of
> > > indexes.  Of course there are more dimensions than that, such as the #
> > > of concurrent read operations and the # of concurrent write
> > > operations, that you can vary as well depending on what your
> > > performance requirements are.
>
> > > I hope this is somewhat helpful and I also hope that it's not totally
> > > incorrect and misleading since, as I said, it's all rampant
> > > speculation based on somewhat limited publicly available data.
>
> > > -Michael
>
> > > P.S.  Of course, if anyone has data including # of records, #/
> > > composisiton of indexes, # reads per hour, # writes per hour and
> > > latency per txn, I'd be fascinated to hear about it too!
>
> > > On Oct 19, 4:01 pm, Diana Cruise <[email protected]> wrote:
>
> > > > This is exactly what I'm am talking about...in my case the User and
> > > > UserAddr are both in the same Entity Group.  So, are you saying that
> > > > my application which has a global presence in GAE can only support 25
> > > > simultaneous Users performing this update in under 5 seconds?
>
> > > > Again, I take 1-10 requests per second response and go with the avg of
> > > > 5/s.  Add up 25 Users simultaneously hitting this Entity Group and
> > > > that consumes a full 5 seconds.  So, if you have 25 Users doing the
> > > > same update over and over they will each have about a 5 second
> > > > response.
>
> > > > I know I am wrong because this is way LOW for a Google platform or any
> > > > other...I just am NOT hearing or seeing numbers that say otherwise.
>
> > > > If you clarify for me that this Entity Group performance stat of 1-10/
> > > > s is granular to the Row then we're on to something...  That would
> > > > tell me that my scenario above only applies if ALL Users were logged
> > > > into the same account!!!  If the Entity Group performance stat is
> > > > granular to the Row then that would mean an infinite number of Users
> > > > would average 5 updates per second.  Please tell me this is TRUE!
>
> ...
>
> read more »- Hide quoted text -
>
> - Show quoted text -
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~----------~----~----~----~------~----~------~--~---

[appengine-java] Re: GAE Performance

Reply via email to