Hi,

I feel your pain. it cost me a few thousand dollars to delete my
millions enities from the datastore after a migration job (ikai never
replied my post though...) and im still paying since the deletion is
not completed yet (spending 100-300$ a day for the past 2 weeks
now!!). Not doing much just running the "delete all" mapreduce job
from the admin panel.

There is totally somethig wrong with the way datastore writes are
priced and google should seriously do something about it before they
lose their big customers (i.e. the ones affected by this problem).

It is simply too costly to go through your data to change an index or
update stuff or delete your data. And in your case (like mine) even if
you want to take your data out to externalize
your custom search an storage it will cost you X000$+ to take it out
and another XX,000$ to cleanup behind you (you seem to have a lot of
indexed properties in your dataset).

Please keep me posted on how things go with you as I'm still hoping i
can get some credit/refund/assisance from google at this stage
although i havent heard from them.



On Jan 6, 7:24 am, "Corey [Firespotter]" <[email protected]>
wrote:
> I work with Petey on this and can help clarify some of the details.
>
> The Entities;
> We have a lot of entities (~14mi) each of which have a
> StringListProperty called "geoboxes".  Like so:
>     class Place(search.SearchableModel):
>       name = db.StringProperty()
>       ...
>       # Location specific fields.
>       coordinates = db.GeoPtProperty(default=None)
>       geohash = db.StringProperty()
>       geoboxes = db.StringListProperty()
>
> Background (details on geoboxing at bottom):
> We're running a mapreduce to change the geobox sizes/precision for a
> large number of entities.  These entities currently have a 'geoboxes'
> StringListProperty with ~20 strings.  For example:
> geoboxes = [u'37.341|-121.894|37.339|-121.892', u'37.341|-121.892|
> 37.339|-121.891', ...]
> We are changing those 20 strings to 20 new strings.  Example:
> geoboxes = [u'37.3411|-121.8940|37.3395|-121.8926',
> u'37.3411|-121.8929|37.3395|-121.8916', ...]
>
> The Cost:
> We did almost this same mapreduce when we first added the geoboxes
> back in July.  In that case we were populating the list for the first
> time so we can assume half as many operations were required (no
> removing of old values).  Total cost i July was ~$160 for the CPU
> time.
>
> When we ran the mapreduce again this week to change the box sizes the
> cost was $18 for Frontend Instance Hours, $15 for Datastore Reads
> (21mil) and $2,500 for Datastore Writes (2500mil).  This was not a
> complete run of the mapreduce.  We aborted it after 5.4mil (38%) of
> the entities were updated.  Hence Petey's estimate that the full
> update would cost $6,500.
>
> The Operations:
> Each entity update is removing ~20 existing strings from the geoboxes
> StringList and adding 20 more.  The geobox property is indexed (and
> has to be) and is involved in 3 composite indexes so as best I
> understand it this means each string change results in 10 writes (4 +
> 2 * 3).  So on every entity we update the geoboxes we perform 401
> write operations (1 + 10 * 40).
>
> This agrees pretty well with the charges (2,500,000,000 ops /
> 5,424,000 entities) = 460 ops per entity.
>
> That's a lot of writes and likely the core of the surprising cost.
> However, I'm not sure how we could avoid that with App Engine (open to
> ideas!), and since we could pay for dedicated servers for that amount,
> I think the pricing is probably off as well.
>
> Even if we treat the geobox update as a one-time cost, we have other
> properties like scores, labels, etc that require occasional tweaking.
> Updating even a single indexed property across all these entities
> costs us $60-$100 and typically many times that in practice because
> these interesting fields tend to be used in composite indexes.
>
> -Corey
>
> Geoboxing Details
> Geoboxing is a technique used to search for entities near a point on
> the earth in a database that can only perform equality queries (like
> App Engine).  In short, you break up the world into boxes and record
> which box each entity belongs to as well as any nearby boxes.  Then
> you break up the world into larger boxes and repeat until you have a
> good range of sizes covered.
> There's a good article on the logic of algorithm 
> here:http://code.google.com/appengine/articles/geosearch.html
>
> On Jan 5, 11:58 am, "Ikai Lan (Google)" <[email protected]> wrote:
>
>
>
>
>
>
>
> > Brian (apologies if that is not your name),
>
> > How much of the costs are instance hours versus datastore writes? There's
> > probably something going on here. The largest costs are to update indexes,
> > not entities. Assuming $6500 is the cost of datastore writes alone, that
> > breaks down to:
>
> > ~$0.0004 a write
>
> > Pricing is $0.10 per 100k operations, so that means using this equation:
>
> > (6500.00 / 14000000) / (0.10 / 100000)
>
> > You're doing about 464 write operations per put, which roughly translates
> > to 6.5 billion writes.
>
> > I'm trying to extrapolate what you are doing, and it sounds like you are
> > doing full text indexing or something similar ... and having to update all
> > the indexes. When you update a property, it takes a certain amount of
> > writes. Assuming you are changing String properties, each property you
> > update takes this many writes:
>
> > - 2 indexes deleted (ascending and descending)
> > - 2 indexes update (ascending and descending)
>
> > So if you were only updating all the list properties, that means you are
> > updating 100 list properties.
>
> > Given that this is a regular thing you need to do, perhaps there is an
> > engineering solution for what you are trying to do that will be more cost
> > effective. Can you describe why you're running this job? What features does
> > this support in your product?
>
> > --
> > Ikai Lan
> > Developer Programs Engineer, Google App Engine
> > plus.ikailan.com | twitter.com/ikai
>
> > On Thu, Jan 5, 2012 at 10:08 AM, Petey <[email protected]> wrote:
> > > In this one case we had to change all of the items in the
> > > listproperty. In our most common case we might have to add and delete
> > > a couple items to the list property every once in a while. That would
> > > still cost us well over $1,000 each time.
>
> > > Most of the reasons for this type of data in our product is to
> > > compensate for the fact that there isn't full text search yet. I know
> > > they are beta testing full text, but I'm still worried that that also
> > > might be too expensive per write.
>
> > > On Jan 5, 6:54 am, Richard Watson <[email protected]> wrote:
> > > > A couple thoughts.
>
> > > > Maybe the GAE team should borrow the idea of spot prices from Amazon.
> > > > That's a great way to have lower-priority jobs that can run when there
> > > are
> > > > instances available. We set the price we're willing to pay, if the spot
> > > > cost drops below that, we get the resources. It creates a market where
> > > more
> > > > urgent jobs get done sooner and Google makes better use of quiet 
> > > > periods.
>
> > > > On your issue:
> > > > Do you need to update every entity when you do this? How many items on
> > > the
> > > > listproperty need to be changed? Could you tell us a bit more of what 
> > > > the
> > > > data looks like?
>
> > > > I'm thinking that 14 million entities x 18 items each is the amount of
> > > > entries you really have, each distributed across at least 3 servers and
> > > > then indexed. That seems like a lot of writes if you're re-writing
> > > > everything.  It's likely a bad idea to rely on an infrastructure change
> > > to
> > > > fix this (recurring) issue, but there is hopefully a way to reduce the
> > > > amount of writes you have to do.
>
> > > > Also, could you maybe run your mapreduce on smaller sets of the data to
> > > > spread it out over multiple days and avoid adding too many instances? 
> > > > Has
> > > > anyone done anything like this?
>
> > > --
> > > You received this message because you are subscribed to the Google Groups
> > > "Google App Engine" group.
> > > To post to this group, send email to [email protected].
> > > To unsubscribe from this group, send email to
> > > [email protected].
> > > For more options, visit this group at
> > >http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to