[google-appengine] Re: Parallel urlfetch utility class / function.

David Wilson Mon, 16 Mar 2009 08:04:01 -0700

Joe,

I've only tested it in production. ;)


The code should work serially on the SDK, but I haven't tried yet.


David.

2009/3/16 Joe Bowman <[email protected]>:
>
> Does the batch fetching working on live appengine applications, or
> only on the SDK?
>
> On Mar 16, 10:19 am, David Wilson <[email protected]> wrote:
>> I have no idea how definitive this is, but literally it means wall
>> clock time seems to be how CPU cost is measured. I guess this makes
>> sense for a few different reasons.
>>
>> I found some internal function
>> "google3.apphosting.runtime._apphosting_runtime___python__apiproxy.get_request_cpu_usage"
>> with the docstring:
>>
>>     Returns the number of megacycles used so far by this request.
>>     Does not include CPU used by API calls.
>>
>> Calling it, then running time.sleep(5), then calling it again,
>> indicates thousands of megacycles used, yet in real terms the CPU was
>> probably doing nothing. I guess Datastore CPU, etc., is added on top
>> of this, but it seems to suggest to me that if you can drastically
>> reduce request time, quota usage should drop too.
>>
>> I have yet to do any kind of rough measurements of Datastore CPU, so
>> I'm not sure how correct this all is.
>>
>> David.
>>
>>  - One of the guys on IRC suggested this means that per-request cost
>> is scaled during peak usage (and thus internal services running
>> slower).
>>
>> 2009/3/16 peterk <[email protected]>:
>>
>>
>>
>>
>>
>> > A couple of questions re. CPU usage..
>>
>> > "CPU time quota appears to be calculated based on literal time"
>>
>> > Can you clarify what you mean here? I presume each async request eats
>> > into your CPU budget. But you say:
>>
>> > "since you can burn a whole lot more AppEngine CPU more cheaply using
>> > the async api"
>>
>> > Can you clarify how that's the case?
>>
>> > I would guess as long as you're being billed for the cpu-ms spent in
>> > your asynchronous calls, Google would let you hang yourself with them
>> > when it comes to billing.. :) so I presume they'd let you squeeze in
>> > as many as your original request, and its limit, will allow for?
>>
>> > Thanks again.
>>
>> > On Mar 16, 2:00 pm, David Wilson <[email protected]> wrote:
>> >> It's completely undocumented (at this stage, anyway), but definitely
>> >> seems to work. A few notes I've come gathered:
>>
>> >>  - CPU time quota appears to be calculated based on literal time,
>> >> rather than e.g. the UNIX concept of "time spent in running state".
>>
>> >>  - I can fetch 100 URLs in 1.3 seconds from a machine colocated in
>> >> Germany using the asynchronous API. I can't begin to imagine how slow
>> >> (and therefore expensive in monetary terms) this would be using the
>> >> standard API.
>>
>> >>  - The user-specified callback function appears to be invoked in a
>> >> separate thread; the RPC isn't "complete" until this callback
>> >> completes. The callback thread is still subject to the request
>> >> deadline.
>>
>> >>  - It's a standard interface, and seems to have no parallel
>> >> restrictions at least for urlfetch and Datastore. However, I imagine
>> >> that it's possible restrictions may be placed here at some later
>> >> stage, since you can burn a whole lot more AppEngine CPU more cheaply
>> >> using the async api.
>>
>> >>  - It's "standard" only insomuch as you have to fiddle with
>> >> AppEngine-internal protocolbuffer definitions for each service type.
>> >> This mostly means copy-pasting the standard sync call code from the
>> >> SDK, and hacking it to use pubsubhubub's proxy code.
>>
>> >> Per the last point, you might be better waiting for an officially
>> >> sanctioned API for doing this, albeit I doubt the protocolbuffer
>> >> definitions change all that often.
>>
>> >> Thanks for Brett Slatkin & co. for doing the digging required to get
>> >> the async stuff working! :)
>>
>> >> David.
>>
>> >> 2009/3/16 peterk <[email protected]>:
>>
>> >> > Very neat.. Thank you.
>>
>> >> > Just to clarify, can we use this for all API calls? Datastore too? I
>> >> > didn't look very closely at the async proxy in pubsubhubub..
>>
>> >> > Asynchronous calls available on all apis might give a lot to chew
>> >> > on.. :) It's been a while since I've worked with async function calls
>> >> > or threading, might have to dig up some old notes to see where I could
>> >> > extract gains from it in my app. Some common cases might be worth the
>> >> > community documenting for all to benefit from, too.
>>
>> >> > On Mar 16, 1:26 pm, David Wilson <[email protected]> wrote:
>> >> >> I've created a Google Code project to contain some batch utilities I'm
>> >> >> working on, based on async_apiproxy.py from pubsubhubbub[0]. The
>> >> >> project currently contains just a modified async_apiproxy.py that
>> >> >> doesn't require dummy google3 modules on the local machine, and a
>> >> >> megafetch.py, for batch-fetching URLs.
>>
>> >> >>    http://code.google.com/p/appengine-async-tools/
>>
>> >> >> David
>>
>> >> >> [0]http://code.google.com/p/pubsubhubbub/source/browse/trunk/hub/async_a...
>>
>> >> >> --
>> >> >> It is better to be wrong than to be vague.
>> >> >>   — Freeman Dyson
>>
>> >> --
>> >> It is better to be wrong than to be vague.
>> >>   — Freeman Dyson
>>
>> --
>> It is better to be wrong than to be vague.
>>   — Freeman Dyson
> >
>



-- 
It is better to be wrong than to be vague.
  — Freeman Dyson

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Parallel urlfetch utility class / function.

Reply via email to