That's not quite fair.  It's easy to get stuck in this trap.

Seems like there's a simple solution to deleting all the data, though:
 After you've moved the important data to a new app, just stop billing
on the old app.  Make reclaiming it Google's problem.

Jeff

On Tue, Dec 27, 2011 at 6:35 PM, Brandon Wirtz <[email protected]> wrote:
> The cold hearted bastard in me has the following thoughts.
>
> You wrote code that treated DataStore Like SQL.
> You didn't set Do Not index on the things you didn't need to index.
> You changed the structure of your data midway but didn't flush and start
> over you just changed.
> Likely you aren't doing any clean up.
> Likely you aren't using the right typing for your data.
>
> So what I hear is "Whine, whine, whine, I built my stuff wrong, Google Tried
> to help me but I wanted to move to Amazon so they didn't have many
> suggestions I liked, so now I'm sad, whine, whine, whine, woe is me.  Please
> tell others so I can get sympathy for not understanding the platform I was
> working on."
>
> Did I miss anything?
>
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Yohan
> Sent: Tuesday, December 27, 2011 5:44 PM
> To: Google App Engine
> Cc: [email protected]
> Subject: [google-appengine] Cautionary Tale: Abusive price for data
> migration and deletion
>
> Hi fellow developers, just a cautionary tale for the new members out there
> and people building up large datasets.
>
> We already know that the difference in reported datastore size between the
> actual data and the total size is due to the indexes and various voodoo
> stuff that the datastore is doing to maintain our data safe. It is even more
> relevant when you are trying to migrate your data out of GAE or simply
> delete your data in bulk.
>
> I was storing about 500 GB of data, translated into > 2 TB of data in the
> datastore (x4...). After spending days to reprocess most of this data to
> remove the unused indexes (and thus losing flexibility in my Queries and
> cost me a few hundreds $), it went down to 1.6TB, still costing me about
> $450 / month for storage alone. Important note is that a lot of this data
> comes from individual small entities (about 1 billion of them), coming from
> reports and stuff. I don't deny that i could have come up with a better
> design, and my latest codebase stores the data in more efficient ways
> (aggregating into serialized Text or Blobs), but I still have to make do for
> the v1 data set sitting there.
>
> I started a migration of the data out of GAE into a simple MySQL instance
> running on EC2. In reality, after migration, the entire dataset only weighs
> < 150GB (including indexes) into MySQL so i have no idea where the extra TB
> is coming from. The migration process was a pain in the a** and took me 5
> freaking weeks to complete. I tried the bulk export from python which sucks
> because it only exports textual data and integers but skips blobs and binary
> data (It seems they don't learn base 64 encoding at google...). So i
> resorted to the remote API after a quick email chat with Greg d'Alesandre
> and Ikai Lan which basically concluded by "sorry cannot help and remote api
> is not a solution". Cool then what is ? The remote API is damn slow and
> expensive: I had to basically read the entities one by one, store the
> extracted file somewhere and process it on the fly with backups and failsafe
> everywhere because the GAE remote api will just break from time to time (due
> to datastore exceptions mostly). The extraction job had to be restarted a
> couple of time because of cursors being screwed up. So reading 1 billion
> entities from datastore takes weeks and costs a lot of dough. But then comes
> the axe: your data is still sitting on GAE and you have to delete it. With 1
> billion entries in the datastore, a x3 / x4 writing factor, it will cost you
> 2-3 k$ to  empty your das bin.. I seriously don't mind paying for datastore
> writes, but having to pay $2000 to delete data that already costs me $450 /
> month is seriously pushing it.
>
> Any mysql / nosql solution that i know of have some sort of flushing
> mechanism that doesn't require deletion of each entry 1 by 1. How come the
> datastore doesn't ? I am not paying the outrageous $500 / month of support
> but I'm paying far more in platform usage (i have an open credit of 300$ /
> day) and so far i didn't get any satisfying answer or support from the GAE
> team. I love the platform but seriously knowing what i know now, vendor
> lockin has never rang so true than with GAE and I would not commit so much
> time and energy on GAE for my big/ serious projects, just leaving it to
> small quick and dirty jobs.
>
> Please share and comment.
>
> Cheers
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>



-- 
We are the 20%

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to