I'd be less abusive if the title of the thread was less so. "Cautionary tale: Building large Scale Data can cost lots if Datastore isn't fully understood"
"Cautionary tale: Failure to be Scrap and Restart your DataStore when making changes to the structure can be expensive" But I don't think that "abusive price" is accurate. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jeff Schnitzer Sent: Tuesday, December 27, 2011 6:57 PM To: [email protected] Subject: Re: [google-appengine] Cautionary Tale: Abusive price for data migration and deletion That's not quite fair. It's easy to get stuck in this trap. Seems like there's a simple solution to deleting all the data, though: After you've moved the important data to a new app, just stop billing on the old app. Make reclaiming it Google's problem. Jeff On Tue, Dec 27, 2011 at 6:35 PM, Brandon Wirtz <[email protected]> wrote: > The cold hearted bastard in me has the following thoughts. > > You wrote code that treated DataStore Like SQL. > You didn't set Do Not index on the things you didn't need to index. > You changed the structure of your data midway but didn't flush and > start over you just changed. > Likely you aren't doing any clean up. > Likely you aren't using the right typing for your data. > > So what I hear is "Whine, whine, whine, I built my stuff wrong, Google > Tried to help me but I wanted to move to Amazon so they didn't have > many suggestions I liked, so now I'm sad, whine, whine, whine, woe is > me. Please tell others so I can get sympathy for not understanding > the platform I was working on." > > Did I miss anything? > > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Yohan > Sent: Tuesday, December 27, 2011 5:44 PM > To: Google App Engine > Cc: [email protected] > Subject: [google-appengine] Cautionary Tale: Abusive price for data > migration and deletion > > Hi fellow developers, just a cautionary tale for the new members out > there and people building up large datasets. > > We already know that the difference in reported datastore size between > the actual data and the total size is due to the indexes and various > voodoo stuff that the datastore is doing to maintain our data safe. It > is even more relevant when you are trying to migrate your data out of > GAE or simply delete your data in bulk. > > I was storing about 500 GB of data, translated into > 2 TB of data in > the datastore (x4...). After spending days to reprocess most of this > data to remove the unused indexes (and thus losing flexibility in my > Queries and cost me a few hundreds $), it went down to 1.6TB, still > costing me about > $450 / month for storage alone. Important note is that a lot of this > data comes from individual small entities (about 1 billion of them), > coming from reports and stuff. I don't deny that i could have come up > with a better design, and my latest codebase stores the data in more > efficient ways (aggregating into serialized Text or Blobs), but I > still have to make do for the v1 data set sitting there. > > I started a migration of the data out of GAE into a simple MySQL > instance running on EC2. In reality, after migration, the entire > dataset only weighs < 150GB (including indexes) into MySQL so i have > no idea where the extra TB is coming from. The migration process was a > pain in the a** and took me 5 freaking weeks to complete. I tried the > bulk export from python which sucks because it only exports textual > data and integers but skips blobs and binary data (It seems they don't > learn base 64 encoding at google...). So i resorted to the remote API > after a quick email chat with Greg d'Alesandre and Ikai Lan which > basically concluded by "sorry cannot help and remote api is not a > solution". Cool then what is ? The remote API is damn slow and > expensive: I had to basically read the entities one by one, store the > extracted file somewhere and process it on the fly with backups and > failsafe everywhere because the GAE remote api will just break from > time to time (due to datastore exceptions mostly). The extraction job > had to be restarted a couple of time because of cursors being screwed > up. So reading 1 billion entities from datastore takes weeks and costs > a lot of dough. But then comes the axe: your data is still sitting on > GAE and you have to delete it. With 1 billion entries in the > datastore, a x3 / x4 writing factor, it will cost you > 2-3 k$ to empty your das bin.. I seriously don't mind paying for > datastore writes, but having to pay $2000 to delete data that already > costs me $450 / month is seriously pushing it. > > Any mysql / nosql solution that i know of have some sort of flushing > mechanism that doesn't require deletion of each entry 1 by 1. How come > the datastore doesn't ? I am not paying the outrageous $500 / month of > support but I'm paying far more in platform usage (i have an open > credit of 300$ / > day) and so far i didn't get any satisfying answer or support from the > GAE team. I love the platform but seriously knowing what i know now, > vendor lockin has never rang so true than with GAE and I would not > commit so much time and energy on GAE for my big/ serious projects, > just leaving it to small quick and dirty jobs. > > Please share and comment. > > Cheers > > -- > You received this message because you are subscribed to the Google > Groups "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > > -- > You received this message because you are subscribed to the Google Groups "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to [email protected]. > For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. > -- We are the 20% -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
