That's not quite fair. It's easy to get stuck in this trap. Seems like there's a simple solution to deleting all the data, though: After you've moved the important data to a new app, just stop billing on the old app. Make reclaiming it Google's problem.
Jeff On Tue, Dec 27, 2011 at 6:35 PM, Brandon Wirtz <[email protected]> wrote: > The cold hearted bastard in me has the following thoughts. > > You wrote code that treated DataStore Like SQL. > You didn't set Do Not index on the things you didn't need to index. > You changed the structure of your data midway but didn't flush and start > over you just changed. > Likely you aren't doing any clean up. > Likely you aren't using the right typing for your data. > > So what I hear is "Whine, whine, whine, I built my stuff wrong, Google Tried > to help me but I wanted to move to Amazon so they didn't have many > suggestions I liked, so now I'm sad, whine, whine, whine, woe is me. Please > tell others so I can get sympathy for not understanding the platform I was > working on." > > Did I miss anything? > > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Yohan > Sent: Tuesday, December 27, 2011 5:44 PM > To: Google App Engine > Cc: [email protected] > Subject: [google-appengine] Cautionary Tale: Abusive price for data > migration and deletion > > Hi fellow developers, just a cautionary tale for the new members out there > and people building up large datasets. > > We already know that the difference in reported datastore size between the > actual data and the total size is due to the indexes and various voodoo > stuff that the datastore is doing to maintain our data safe. It is even more > relevant when you are trying to migrate your data out of GAE or simply > delete your data in bulk. > > I was storing about 500 GB of data, translated into > 2 TB of data in the > datastore (x4...). After spending days to reprocess most of this data to > remove the unused indexes (and thus losing flexibility in my Queries and > cost me a few hundreds $), it went down to 1.6TB, still costing me about > $450 / month for storage alone. Important note is that a lot of this data > comes from individual small entities (about 1 billion of them), coming from > reports and stuff. I don't deny that i could have come up with a better > design, and my latest codebase stores the data in more efficient ways > (aggregating into serialized Text or Blobs), but I still have to make do for > the v1 data set sitting there. > > I started a migration of the data out of GAE into a simple MySQL instance > running on EC2. In reality, after migration, the entire dataset only weighs > < 150GB (including indexes) into MySQL so i have no idea where the extra TB > is coming from. The migration process was a pain in the a** and took me 5 > freaking weeks to complete. I tried the bulk export from python which sucks > because it only exports textual data and integers but skips blobs and binary > data (It seems they don't learn base 64 encoding at google...). So i > resorted to the remote API after a quick email chat with Greg d'Alesandre > and Ikai Lan which basically concluded by "sorry cannot help and remote api > is not a solution". Cool then what is ? The remote API is damn slow and > expensive: I had to basically read the entities one by one, store the > extracted file somewhere and process it on the fly with backups and failsafe > everywhere because the GAE remote api will just break from time to time (due > to datastore exceptions mostly). The extraction job had to be restarted a > couple of time because of cursors being screwed up. So reading 1 billion > entities from datastore takes weeks and costs a lot of dough. But then comes > the axe: your data is still sitting on GAE and you have to delete it. With 1 > billion entries in the datastore, a x3 / x4 writing factor, it will cost you > 2-3 k$ to empty your das bin.. I seriously don't mind paying for datastore > writes, but having to pay $2000 to delete data that already costs me $450 / > month is seriously pushing it. > > Any mysql / nosql solution that i know of have some sort of flushing > mechanism that doesn't require deletion of each entry 1 by 1. How come the > datastore doesn't ? I am not paying the outrageous $500 / month of support > but I'm paying far more in platform usage (i have an open credit of 300$ / > day) and so far i didn't get any satisfying answer or support from the GAE > team. I love the platform but seriously knowing what i know now, vendor > lockin has never rang so true than with GAE and I would not commit so much > time and energy on GAE for my big/ serious projects, just leaving it to > small quick and dirty jobs. > > Please share and comment. > > Cheers > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- We are the 20% -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
