Re: [google-appengine] Re: Backend Deferred Tasks eating CPU Time?

Rishi Arora Tue, 04 Oct 2011 12:03:57 -0700

You're welcome.  I struggled with minimizing data store writes as well.
 Some tips, based on what worked for me:


1.  multiple writes to the same entity (for example a "number of accesses
per day" counter - instead of incrementing the counter and writing it to the
datastore every time, its better to add each increment operation to a pull
queue.  Then at the end of the day, run a cron job to pull items from the
queue and increment the counter in memory, and then write the final result
to the datastore).  I'm kind of cheating and using the pull queue as a
reliable transient storage, accessible from either front-ends or backends.
 The 100MB taskqueue capacity limitation and 100k taskqueue API calls
limitation works just fine for me.

2.  Moved away things like high level application logging to an external
database instead of the datastore.  My GAE app now logs such events to a
pull queue (again, another hack making use of pull queues), and a cron job
batches these logs and sends them to an external database over an HTTP
connection.

3.  I discovered I had twice as many indexes as I really needed.  Back when
datastore writes weren't so expensive, I lavishly created indexes all over
the place.  I trimmed that down by 50% - huge impact to number of datastore
writes.  Inserting a new Entity record, with 10 properties and 5
multi-property indices will cause 16 write operations to the datastore (1
for the entity, 10 for each of the indices corresponding to the 10
properties, and 5 for each of the multi-property indices).  A delete will
cause 16 writes as well.  An update will cause upto 16 writes, depending on
what indices are affected.


I think, overall, creative uses of the memcache and pull queues can help
avoid using the datastore for "transient" storage.  For instances, the GAE
Appstats utility uses memcache exclusively for its functioning.


On Tue, Oct 4, 2011 at 11:53 AM, someone1 <[email protected]> wrote:

> Thanks for the reply Rishi.
>
> I think I will try the pull queue idea since I was going to move to a
> model like that after switching to backend instances. I will also try
> to see how fast my process can finish using a lower amount of
> instances with this in place, each task takes about 4 seconds and we'd
> like to finish running within 30 minutes, so 10 instances should be
> fine.
>
> If the datastore CPU time is being billed against what I see as CPU
> Time, then this answers my question. The new pricing scheme estimates
> millions of writes everytime I run my task, I hope I can find a way to
> reduce this as it will end up costing more than what the current
> pricing scheme will cost me.
>
>
> Thanks again!
>
> -Prateek
>
> On Oct 4, 11:27 am, Rishi Arora <[email protected]> wrote:
> > I don't think backend CPU time counts against your front-end instance CPU
> > hours quota.  Backends are purely billed based on uptime - and I think
> this
> > is true for both current and new pricing (starting November 1).  But you
> > mentioned something about datastore CPU?  It is likely that the most of
> your
> > billing is because of this.
> >
> > Few more questions:  Is the reason for 20 backend instances that you want
> to
> > execute all your 20 deferred tasks parallely?  You mentioned you have
> ~4500
> > tasks.  How long does each one take, and how often does each one need to
> > execute in a day?  Lets assume that neither of these tasks are sensitive
> to
> > latency, and can be executed at any time during the entire day.  If each
> > task takes 10 seconds on average, and needs to execute, for example, 6
> times
> > a day... that's a total of 4500 * 10 * 6 / 3600 = 75 instance hours.  You
> > should try to "schedule" your backends yourself so that you only pay for
> 75
> > instance hours.
> >
> > If you allow 20 instances to get created, then at some point all these 20
> > instances will complete their work, and then idle for 15 minutes (or at
> > least billed for 15 minutes of idle time after the last task completes
> > processing), before they're shutdown.  You'll be wasting 5 instance hours
> > each time this happens.  I think your focus should be to minimize your
> > instance hours by minimizing the number of parallel instances you allow
> > running.  In my calculation above you only need a total of 75 instance
> > hours, and so you should set "instances" to 3 or 4 in backends.yaml.
> >
> > A yet another way of doing this in a more controlled fashion is by using
> > "pull" queues instead of enqueueing tasks on the regular "push" type task
> > queues.  You can enqueue all your 4500 tasks on a single "pull" queue,
> and
> > all your backends will constantly run pulling tasks out of pull queues
> and
> > executing them, until the pull queue is empty.  Then they can be woken up
> > again by a cron job to go check the pull queue again.
> >
> > Lastly, any cost you're incurring because of Datastore CPU hours in the
> > current pricing model, or because of Datastore writes in the new pricing
> > model - those can't be avoided.  You will incur those costs regardless of
> > the context of execution of your tasks - front-end or backend.
> >
> > Hope this helps.
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Oct 4, 2011 at 9:33 AM, someone1 <[email protected]> wrote:
> > > Hello,
> >
> > > Thank you for the replies.
> >
> > > The tasks are indeed being run on my backend as I am being charged for
> > > backend usage (I don't use backends in any other way otherwise). I
> > > also see the _ah/deferred logs on my backend ID but not on my app. The
> > > CPU time being used seems to correspond to the Datastore CPU time, are
> > > the two currently linked? Even under the new pricing scheme, would I
> > > be billed for my main app up-time when executing deferred tasks on my
> > > backend? What part of my main app is considered under use when
> > > executing tasks on the backend?
> >
> > > The max number of instances allowed on any backend is 20. My backend
> > > is setup as dynamic B1 with 20 instances. I let Google determine how
> > > many instances needed to be up and running when I queue up my tasks
> > > (usually all 20 run for 20-30 minutes each).
> >
> > > Again, I'd just like to know what part of my main app is being used
> > > during the backend operation that is eating up my CPU. I really
> > > haven't coded anything else on my app except for this data mining
> > > portion which should all be run on the backend now.
> >
> > > Thanks,
> > > Prateek
> >
> > > On Oct 4, 8:38 am, Rishi Arora <[email protected]> wrote:
> > > > I think deferred tasks is an excellent use case for backends.  That's
> how
> > > I
> > > > use my backend as well. Can you confirm from your logs that your
> tasks
> > > are
> > > > indeed being processed on the backend?  In the drop down for app
> > > versions,
> > > > there's a special "version" which is named after your backend.
>  Select
> > > that
> > > > to check your logs specific to the backend.  Also, I'm assuming the
> > > reason
> > > > you're blowing through your budget is because you're spanning out
> > > multiple,
> > > > possibly hundreds of instances.  Can you find out how many instances
> get
> > > > spawned for your deferred tasks?  Can you find out how many backend
> > > > instances are being spawned, if the backend is indeed being used for
> your
> > > > tasks?  Finally, when you configured your backend, what did you set
> as
> > > your
> > > > "instances" parameter in backends.yaml?  I don't know what the
> default
> > > is,
> > > > but it is likely "unlimited".  In your case, a instance of 1 or 2
> sounds
> > > > sufficient, but you'll have to play around with that, based on how
> much
> > > > queueing occurs for your tasks.
> >
> > > > On Tue, Oct 4, 2011 at 12:06 AM, Gerald Tan <[email protected]>
> > > wrote:
> > > > > I believe CPU time will no longer be billable after the new pricing
> is
> > > out
> >
> > > > >  --
> > > > > You received this message because you are subscribed to the Google
> > > Groups
> > > > > "Google App Engine" group.
> > > > > To view this discussion on the web visit
> > > > >https://groups.google.com/d/msg/google-appengine/-/Crry-7yTG4QJ.
> >
> > > > > To post to this group, send email to
> [email protected]
> > > .
> > > > > To unsubscribe from this group, send email to
> > > > > [email protected].
> > > > > For more options, visit this group at
> > > > >http://groups.google.com/group/google-appengine?hl=en.
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "Google App Engine" group.
> > > To post to this group, send email to [email protected]
> .
> > > To unsubscribe from this group, send email to
> > > [email protected].
> > > For more options, visit this group at
> > >http://groups.google.com/group/google-appengine?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Backend Deferred Tasks eating CPU Time?

Reply via email to