[google-appengine] Re: CPU time for trivial datastore roundtrip

Anthony Mills Sat, 11 Jul 2009 17:39:23 -0700

Nick, thanks for responding. I'll add a lot more detail and test data
here.


I'm doing three fetches: one to get the user data, one to get his
starred items, one to get his own items. I've broken it down to
examine each one individually. The user fetch is fast:

userEntity = datastore.get(KeyFactory.createKey("User", makeUserKeyName
(getCurrentUserId())));

/service?a=g&t=u 200 56ms 27cpu_ms 12api_cpu_ms 0kb
/service?a=g&t=u 200 35ms 21cpu_ms 12api_cpu_ms 0kb
/service?a=g&t=u 200 26ms 15cpu_ms 12api_cpu_ms 0kb
/service?a=g&t=u 200 49ms 15cpu_ms 12api_cpu_ms 0kb
/service?a=g&t=u 200 37ms 15cpu_ms 12api_cpu_ms 0kb
/service?a=g&t=u 200 27ms 15cpu_ms 12api_cpu_ms 0kb

The saved query is not so fast. Each item has a list property "s"
which is a list of all user IDs that have saved that item. It's served
by a dedicated index, s asc + i asc.

Query query = new Query("Item");
query.addFilter("s", Query.FilterOperator.EQUAL, getCurrentUserId());
query.addSort("i");
List<Entity> items = datastore.prepare(query).asList(withLimit
(REQUEST_SIZE));

/service?a=g&t=ts 200 97ms 710cpu_ms 678api_cpu_ms 0kb
/service?a=g&t=ts 200 79ms 711cpu_ms 678api_cpu_ms 0kb
/service?a=g&t=ts 200 84ms 705cpu_ms 678api_cpu_ms 0kb
/service?a=g&t=ts 200 87ms 716cpu_ms 678api_cpu_ms 0kb
/service?a=g&t=ts 200 84ms 696cpu_ms 678api_cpu_ms 0kb
/service?a=g&t=ts 200 72ms 697cpu_ms 678api_cpu_ms 0kb

The query to get the user's items is similarly slow. It is a query on
a string property "u", ordered by "i". It's served by a dedicated
index, u asc + i asc.

Query query = new Query("Item");
query.addFilter("u", Query.FilterOperator.EQUAL, getCurrentUserId());
query.addSort("i");
List<Entity> items = datastore.prepare(query).asList(withLimit
(REQUEST_SIZE));

/service?a=g&t=tm 200 71ms 699cpu_ms 678api_cpu_ms 0kb
/service?a=g&t=tm 200 90ms 710cpu_ms 678api_cpu_ms 0kb
/service?a=g&t=tm 200 73ms 699cpu_ms 678api_cpu_ms 0kb
/service?a=g&t=tm 200 67ms 692cpu_ms 678api_cpu_ms 0kb
/service?a=g&t=tm 200 83ms 698cpu_ms 678api_cpu_ms 0kb
/service?a=g&t=tm 200 80ms 722cpu_ms 678api_cpu_ms 0kb

Adding the sort does not seem to add any extra time, and nor should
it, given the index type.

Anyway, put the three together and this is what you get:

/service?a=g 200 929ms 1731cpu_ms 1368api_cpu_ms 24kb
/service?a=g 200 159ms 1423cpu_ms 1368api_cpu_ms 24kb
/service?a=g 200 171ms 1420cpu_ms 1368api_cpu_ms 24kb
/service?a=g 200 178ms 1414cpu_ms 1368api_cpu_ms 24kb
/service?a=g 200 146ms 1405cpu_ms 1368api_cpu_ms 24kb

See, what really annoys me is this. Those requests are coming back in
170 ms. But I'm charged for 1400 ms of CPU time. How does that work?
It should be doing blocking work on the server, right? Are there
really nine servers all churning for that whole time?

Thanks for your time,

Anthony Mills

On Jul 10, 11:07 am, Alex Popescu
<[email protected]> wrote:
> Here's some sample code (not exactly the one I've previously mentioned
> as I haven't got the time to prepare it):
>
> [code]
> query = ContentEvent.gql('ORDER BY created_at DESC')
> event_list = query.fetch(max_results, offset)
> # now that we've fetched the initial list we need to navigate all the
> references
> # to make sure that there will be no further DB hits
> keys = []
> for event in event_list:
>   keys.append(event.get_refkey('entry'))
> entries = db.get(keys)
> for index, event in enumerate(event_list):
>   event.entry = entries[index]
> [/code]
>
> This code gets called with a max_results=20 and various offsets
> (pagination).
> Now the results:
>
> 07-10 09:51AM 01.720 /_m/cs?p=6 200 173ms 690cpu_ms 522api_cpu_ms 4kb
> 07-10 09:50AM 57.408 /_m/cs?p=5 200 203ms 690cpu_ms 522api_cpu_ms 4kb
> 07-10 09:50AM 52.963 /_m/cs?p=4 200 261ms 717cpu_ms 522api_cpu_ms 5kb
> 07-10 09:50AM 48.888 /_m/cs?p=3 200 164ms 745cpu_ms 522api_cpu_ms 5kb
> 07-10 09:50AM 42.068 /_m/cs?p=2 200 1563ms 2034cpu_ms 522api_cpu_ms
> 6kb
> 07-10 09:46AM 40.239 /_m/cs?p=1 200 168ms 745cpu_ms 522api_cpu_ms 5kb
> 07-10 09:46AM 30.222 /_m/cs?p=2 200 1422ms 2089cpu_ms 522api_cpu_ms
> 6kb
> 07-10 09:45AM 20.582 /_m/cs 200 2718ms 2145cpu_ms 522api_cpu_ms 5kb
>
> As you can see the initial requests are above 2000cpu_ms. Then it
> looks like there is some pre-fetching kicking in that pushes this time
> down to 690-750 cpu_ms.
>
> To add a bit more detail to the above code: there are 2 entities one
> of them keeping a ReferenceProperty to the other. Still, I am making
> sure there are only 2 hits to the datastore by fetching all child
> entities in a single get (basically there are 2 hits to the datastore
> each returning 20 results).
>
> The cpu_ms range is quite huge though: 690-750cpu_ms to 2000+cpu_ms.
>
> I hope to get some sample code for the initial scenario and report
> back.
>
> ./alex
> --
> .w( the_mindstorm )p.
>   Alexandru Popescu
>
> On Jul 10, 7:17 pm, "Nick Johnson (Google)" <[email protected]>
> wrote:
>
> > Here's the relevant ops that one of these pages 
> > (ex:http://www.nmaps.net/172547) does:
>
> > self.user = model.User.get(db.Key(self.session['user']))
>
> > map = model.Map.get_by_map_id(int(map_id))
>
> > q = model.Comment.all().ancestor(map).order("-lastupdated")
> > comments = q.fetch(COMMENTS_PER_PAGE + 1)
>
> > q = model.Map.all()
> > q.filter("user =", map.user)
> > q.filter("unlisted =", False)
> > q.filter("float_num <", map.float_num)
> > q.order("-float_num")
> > other_maps = q.fetch(6)
>
> > As you can see, typically two gets for a single entity, along with a
> > couple of queries - one for up to 40 entities (COMMENTS_PER_PAGE=40),
> > one for 6.
>
> > -Nick Johnson
>
> > On Fri, Jul 10, 2009 at 5:09 PM, Alex
>
> > Popescu<[email protected]> wrote:
>
> > > Hi Nick,
>
> > > I'll deploy some tests to further investigate this issue and I'll
> > > provide the code and results.
>
> > > Meanwhile, I'm wondering if you could share the relevant pieces of
> > > code in your app that are tipically
> > > below 600ms. I must confess that I haven't seen many such cases in my
> > > app so far
> > > (except some on which I'm doing extremely aggressive caching).
>
> > > ./alex
> > > --
> > > .w( the_mindstorm )p.
> > >  Alexandru Popescu
>
> > > On Jul 10, 2:46 pm, "Nick Johnson (Google)" <[email protected]>
> > > wrote:
> > >> Hi Alex,
>
> > >> It's impossible to give a useful comment without first seeing the code
> > >> that you're using. Speaking from my own experience, I have a personal
> > >> site that serves reasonably datastore-intensive pages, and typically
> > >> total CPU milliseconds doesn't exceed about 600 - that's to do several
> > >> get requests and at least one query. So it's certainly possible to do
> > >> useful work with the datastore and come in well below the 1300 to
> > >> 1400ms you quote.
>
> > >> -Nick Johnson
>
> > >> On Fri, Jul 10, 2009 at 1:27 AM, Alex
>
> > >> Popescu<[email protected]> wrote:
>
> > >> > Hi guys,
>
> > >> > I have a cron op that is fetching some records for an entity and does
> > >> > some processing on them.
>
> > >> > While checking my app logs I have noticed that for the trivial case
> > >> > when no result is returned (and so there is no additional processing
> > >> > done) this operation is billed constantly with something between
> > >> > 1300ms and 1400ms.
>
> > >> > This basically tells me that this is the initial cost of a roundtrip
> > >> > to the datastore (plus basic request dispatching; note: the response
> > >> > doesn't have any data).
>
> > >> > Based on this I can further deduce that on a daily basis the free CPU
> > >> > quota will allow me to run around 18k datastore roundtrips with 0
> > >> > results.
>
> > >> > How do you comment on this data when compared with the public
> > >> > announcements you've done in the past about the amount of requests an
> > >> > app can serve based only on free quota?
>
> > >> > ./alex
> > >> > --
> > >> > .w( the_mindstorm )p.
> > >> >  Alexandru Popescu
>
> > >> > PS: I have argued (and I continue to believe) that billing for CPU
> > >> > time is completely incorrect to app engine customers and that Google
> > >> > should look into some alternative options:
> > >> >http://themindstorms.blogspot.com/2009/06/open-letter-to-google-app-e...
>
> > >> --
> > >> Nick Johnson, App Engine Developer Programs Engineer
> > >> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
> > >> Number: 368047
>
> > --
> > Nick Johnson, App Engine Developer Programs Engineer
> > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
> > Number: 368047
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: CPU time for trivial datastore roundtrip

Reply via email to