I'd be less abusive if the title of the thread was less so.

"Cautionary tale: Building large Scale Data can cost lots if Datastore isn't
fully understood"

"Cautionary tale: Failure to be Scrap and Restart your DataStore when making
changes to the structure can be expensive"

But I don't think that "abusive price" is accurate.


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Jeff Schnitzer
Sent: Tuesday, December 27, 2011 6:57 PM
To: [email protected]
Subject: Re: [google-appengine] Cautionary Tale: Abusive price for data
migration and deletion

That's not quite fair.  It's easy to get stuck in this trap.

Seems like there's a simple solution to deleting all the data, though:
 After you've moved the important data to a new app, just stop billing on
the old app.  Make reclaiming it Google's problem.

Jeff

On Tue, Dec 27, 2011 at 6:35 PM, Brandon Wirtz <[email protected]> wrote:
> The cold hearted bastard in me has the following thoughts.
>
> You wrote code that treated DataStore Like SQL.
> You didn't set Do Not index on the things you didn't need to index.
> You changed the structure of your data midway but didn't flush and 
> start over you just changed.
> Likely you aren't doing any clean up.
> Likely you aren't using the right typing for your data.
>
> So what I hear is "Whine, whine, whine, I built my stuff wrong, Google 
> Tried to help me but I wanted to move to Amazon so they didn't have 
> many suggestions I liked, so now I'm sad, whine, whine, whine, woe is 
> me.  Please tell others so I can get sympathy for not understanding 
> the platform I was working on."
>
> Did I miss anything?
>
>
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Yohan
> Sent: Tuesday, December 27, 2011 5:44 PM
> To: Google App Engine
> Cc: [email protected]
> Subject: [google-appengine] Cautionary Tale: Abusive price for data 
> migration and deletion
>
> Hi fellow developers, just a cautionary tale for the new members out 
> there and people building up large datasets.
>
> We already know that the difference in reported datastore size between 
> the actual data and the total size is due to the indexes and various 
> voodoo stuff that the datastore is doing to maintain our data safe. It 
> is even more relevant when you are trying to migrate your data out of 
> GAE or simply delete your data in bulk.
>
> I was storing about 500 GB of data, translated into > 2 TB of data in 
> the datastore (x4...). After spending days to reprocess most of this 
> data to remove the unused indexes (and thus losing flexibility in my 
> Queries and cost me a few hundreds $), it went down to 1.6TB, still 
> costing me about
> $450 / month for storage alone. Important note is that a lot of this 
> data comes from individual small entities (about 1 billion of them), 
> coming from reports and stuff. I don't deny that i could have come up 
> with a better design, and my latest codebase stores the data in more 
> efficient ways (aggregating into serialized Text or Blobs), but I 
> still have to make do for the v1 data set sitting there.
>
> I started a migration of the data out of GAE into a simple MySQL 
> instance running on EC2. In reality, after migration, the entire 
> dataset only weighs < 150GB (including indexes) into MySQL so i have 
> no idea where the extra TB is coming from. The migration process was a 
> pain in the a** and took me 5 freaking weeks to complete. I tried the 
> bulk export from python which sucks because it only exports textual 
> data and integers but skips blobs and binary data (It seems they don't 
> learn base 64 encoding at google...). So i resorted to the remote API 
> after a quick email chat with Greg d'Alesandre and Ikai Lan which 
> basically concluded by "sorry cannot help and remote api is not a 
> solution". Cool then what is ? The remote API is damn slow and
> expensive: I had to basically read the entities one by one, store the 
> extracted file somewhere and process it on the fly with backups and 
> failsafe everywhere because the GAE remote api will just break from 
> time to time (due to datastore exceptions mostly). The extraction job 
> had to be restarted a couple of time because of cursors being screwed 
> up. So reading 1 billion entities from datastore takes weeks and costs 
> a lot of dough. But then comes the axe: your data is still sitting on 
> GAE and you have to delete it. With 1 billion entries in the 
> datastore, a x3 / x4 writing factor, it will cost you
> 2-3 k$ to  empty your das bin.. I seriously don't mind paying for 
> datastore writes, but having to pay $2000 to delete data that already 
> costs me $450 / month is seriously pushing it.
>
> Any mysql / nosql solution that i know of have some sort of flushing 
> mechanism that doesn't require deletion of each entry 1 by 1. How come 
> the datastore doesn't ? I am not paying the outrageous $500 / month of 
> support but I'm paying far more in platform usage (i have an open 
> credit of 300$ /
> day) and so far i didn't get any satisfying answer or support from the 
> GAE team. I love the platform but seriously knowing what i know now, 
> vendor lockin has never rang so true than with GAE and I would not 
> commit so much time and energy on GAE for my big/ serious projects, 
> just leaving it to small quick and dirty jobs.
>
> Please share and comment.
>
> Cheers
>
> --
> You received this message because you are subscribed to the Google 
> Groups "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>
> --
> You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
[email protected].
> For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.
>



--
We are the 20%

--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to