On Mon, Jan 17, 2011 at 3:08 AM, Eli Jones <[email protected]> wrote:

> I just want to re-iterate that I still agree with myself here.
>
> Though, one issue that I can see with using asynchronous urlfetch... it
> will be returning results to your callback handler.. thus, I am guessing,
> you'll be stuck with one result per urlfetch.. and if you are putting this
> information to the datastore... a single db.put per urlfetch.
>
> Which.. would be bad, and very costly for 100,000+ tasks every five
> minutes.  So.. you're stuck balancing the cost of fetching and putting all
> of these items as fast as possible.. with maybe doing it a little bit slower
> (in batches of 100 or 1000).. and putting items to the datastore in batches.
>

An easy way to avoid this is to kick off a batch of asynchronous URLFetch
requests, wait for all of them to complete, then put the results to the
datastore in a single batch put.

-Nick Johnson


>
> Now, I can imagine a number of fanciful approaches to trying to weasel out
> of this issue.. but its best for you to just start testing and see what
> happens.
>
> With that said.. one fun approach would be to have the callback handler
> stick the async fetch result into the process cache.. using some global
> variable.. and once that var fills up with enough items.. put them to the
> datastore in a batch.
>
> Sadly.. you then run into the problem of an item being orphaned in one of
> your GAE instances.. (the callback sticks it in memory.. but there aren't
> enough items for a put yet.. and that particular instance never handles
> another callback).
>
> Though, this could be very useful for getting maybe 75% - 95% of the work
> done very fast.. then you could have a follow up task that did the remaining
> work in a more meticulous manner (but, its hard to imagine an easy or
> efficient way to determine which urlfetches didn't get put to the
> datastore.)
>
>
> On Sun, Jan 16, 2011 at 12:56 AM, Robert Kluin <[email protected]>wrote:
>
>> I think Eli has a good suggestion (again), use task-chaining with
>> countdowns + async urlfetches in small batches.  Just beware,
>> countdowns are only an estimate, and if the queue is backing up the
>> task may not run when you want.
>>
>> Just thinking about this, I would probably try to batch similarly
>> performing websites into small batches to monitor together.  So if you
>> got sites that typically respond very fast group them, like-wise for
>> slow sites.  I suspect that will help you optimize your queue layouts,
>> maybe you could use some queues for 'fast' and others for 'slow'
>> groups.  Just some thoughts.
>>
>> I also agree with some of the other commenters, you should setup some
>> tests and see if you still feel like this is the right platform for
>> your app.
>>
>>
>> Robert
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sat, Jan 15, 2011 at 11:03, supercobra <[email protected]> wrote:
>> > The countdown parameter of TaskQueue is indeed a big help here. Thanks
>> > for pointing that out.
>> >
>> > -- [email protected]
>> > http://supercobrablogger.blogspot.com/
>> >
>> >
>> >
>> > On Fri, Jan 14, 2011 at 3:41 PM, Uros Trebec <[email protected]>
>> wrote:
>> >> re
>> >>
>> >> On Jan 14, 7:24 pm, supercobra <[email protected]> wrote:
>> >>> One of the challenge is to wait for 5 minutes. E.g. Fetch a URL, store
>> >>> results, wait 5 min, do it again. Since a queue will execute the task
>> >>> almost immediately (if it is empty) this would not work unless the
>> >>> queue is filled w/ a known number of tasks.
>> >>>
>> >>> Any suggestion welcome.
>> >>
>> >> You can use the 'countdown' parameter in Task constructor (
>> >> http://code.google.com/appengine/docs/python/taskqueue/tasks.html#Task
>> >> ) to set the number of seconds for the Task to wait in the queue
>> >> before executing. I use this for scheduling a task a few minutes in
>> >> the future when UrlFetch returns the data I already have.
>> >>
>> >> lp,
>> >> Uros
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups "Google App Engine" group.
>> >> To post to this group, send email to [email protected]
>> .
>> >> To unsubscribe from this group, send email to
>> [email protected]<google-appengine%[email protected]>
>> .
>> >> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>> >>
>> >>
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "Google App Engine" group.
>> > To post to this group, send email to [email protected].
>> > To unsubscribe from this group, send email to
>> [email protected]<google-appengine%[email protected]>
>> .
>> > For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>> >
>> >
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<google-appengine%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<google-appengine%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>



-- 
Nick Johnson, Developer Programs Engineer, App Engine
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
368047

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to