Lots.

Did you see the thread about the push the button check back in 48 hours?

Though to be fair on RDS we just did a data dump to move to a new system
which we won't mention here, and our SQL export to 288 hours 17 minutes.  

Data migration over the internet is tough when you get above 1 TB. And
making sure you don't have corruption during the move is rough.



-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Yohan
Sent: Thursday, December 29, 2011 3:41 AM
To: Google App Engine
Subject: [google-appengine] Re: Cautionary Tale: Abusive price for data
migration and deletion


Hi Brandon,

Interesting story but you rarely design facebook for 500 millions people
right from the start and alone...

Anyway i would love to know how much it would cost you and how long you
would need to get your data out of your super/big apps.

Please share.

Cheers

On Dec 29, 6:58 pm, "Brandon Wirtz" <[email protected]> wrote:
> If you check the archives I have shared times when my requests were 
> well over 5000/s.
>
> I would say GAE handles big data really well. But you have to do 
> testing to make sure your structure is correct, and that your indexes 
> are well thought out.
>
> Planning is always possible.  Testing is always possible. But like 
> driving my Mini Cooper around LeGuna Seca, vs. driving a Ferrari 
> around it.  The Ferrari is only faster  if you can handle it. My mom 
> can run laps in the mini cooper, but would end up in the wall in a
Ferrari.
>
> Or like the discussion about executing code from students.
>
> GAE is cycles on demand, so if you can build your app to be efficient 
> it is cheap. If you build it with errors it is expensive.
>
> I recently found I could knock 3% off of my bill by disabling logging.
> That's the level of testing we do.   People say "but how can you 
> afford to pay devs to write code if you worry that much"   well we are 
> betting on the long haul. We only need to learn the lesson once to 
> capitalize on it for years.
>
> You say you can't predict growth. Sure I can. I either engineer 
> something to work for me and 3 of my friends, or I engineer it to be the
next facebook.
> There is room for some differences along the way, but I could build 
> facebook on GAE.  No worry about big data, or scaling. (I think the 
> GAE team would deploy servers for me as fast as I could fill them)
>
> Things that are designed for you and your friends you don't market, 
> you don't tell people about, so they don't grow.  When we went from 
> CDNinabox going from something brandon uses for his sites to being a 
> product, the product got lots of complete re-writes. Testing in Java 
> and Python, the caching mechanism we use ended up using 4 different 
> models based on the type of site traffic the site we are accelerating 
> gets.  1 hack for me became a software with 40+ optimizations that can 
> be turned on and off to make things run up to 80% cheaper than the
defaults. And to pick those settings we test.
> We even schedule changes to test real traffic for periods of time.
>
> I think the real lesson I'm trying to convey is one I learned at MSFT.  
> For every dev there is 1/40th of a CTO, 1/10 of a product manager 2 
> test engineers 1/5 of a release manager, and 1/5 of a performance 
> engineer. That is 2.5 support staff for every programmer.  If you are 
> just writing code you are working in a vacuum that makes it hard to 
> plan, test, debug, and run scalability metrics.
>
>
>
>
>
>
>
> -----Original Message-----
> From: [email protected]
>
> [mailto:[email protected]] On Behalf Of Yohan
> Sent: Thursday, December 29, 2011 2:00 AM
> To: Google App Engine
> Subject: [google-appengine] Re: Cautionary Tale: Abusive price for 
> data migration and deletion
>
> Hi Brandon,
>
> Well I started using GAE simply because 2 years ago i was a tech team 
> of 1 and I couldn't afford to hire full time sysadmins. I'm migrating 
> some of my stuff out now that i have more guys to help me. And GAE is 
> a great platform that runs on its own and doesn't require much 
> administration (i launched games and apps on it that just run for 
> months with no major issues). So great for starting up. But as soon as 
> you enter the big data domain, you need more control about the way you 
> can process and move your data around (the big companies all have 
> their own datacenters because they need full control about the
> infrastructure) and thus a PAAS may not be suited anymore.
>
> It's hard to plan that your business will grow 10x within a few months 
> and the tech infrastructure must suddenly grow from 50 req/s to 5,000 
> req/s. BTW GAE can't handle such load well (latency of min 500ms on 
> java seriously suck, not talking about write contention on the 
> datastore). It is easy to plan when everything can be defined in 
> advance (with budgets and stuff) but you don't always have the option.
>
> But thanks for sharing your inputs anyway, always appreciated ;)
>
> On Dec 29, 4:31 pm, "Brandon Wirtz" <[email protected]> wrote:
> > Development is not about not making mistakes, it is about doing 
> > structured performance testing and cost analysis.
>
> > My team writes 500 lines of code for every 50 that make it in to the 
> > final product.
>
> > We know things about the efficiencies of  Do While vs. ForEach that 
> > quite possibly Google doesn't even know.  We are that anal about 
> > testing.  We test query speed done different way's and compare cost 
> > and performance based on the anticipated ratios of use.
>
> > We just never let "mistakes" grow to the point we can't control them.
>
> > -----Original Message-----
> > From: [email protected]
>
> > [mailto:[email protected]] On Behalf Of Raymond
> > Sent: Thursday, December 29, 2011 12:13 AM
> > To: Google App Engine
> > Subject: [google-appengine] Re: Cautionary Tale: Abusive price for 
> > data migration and deletion
>
> > Dear Yohan,
>
> > On my side I thank you for sharing your experience, I am beginning 
> > with GAE and know that whatever the time I will put on this project 
> > I will be making beginner mistakes and this kind of info is precious.
> > I have now a limited experience with GAE and have to compare it with 
> > what I know and in some sectors GAE look very bad, for example I 
> > can't imagine Oracle, DB2, Informix, etc, ...MsSQL, etc having any 
> > commercial success if they would not have implemented rock solid 
> > solutions to import and export data, backup, build and drop tables 
> > and databases and of course calculate precisely the data space 
> > required to build a data structure, in some cases down to the byte.
> > Although I understand the very different nature of GAE compared to 
> > this traditional DB engines, I think that any professional 
> > developer, IT manager, project manager, or person responsible for 
> > budget would feel very uncomfortable building a system without a 
> > firm grip on it's costs or a reasonable solution to modify an 
> > initial implementation or migrate away from it. Also the fact that 
> > part of the GAE tools are simply not reliable enough to be able to 
> > plan effort and time required to do something is an other big minus for
this solution.
>
> > Although DB's are not my main competence, my very first paid job
> > 20+years ago was to migrate a critical database to a new structure 
> > 20+on
> > a new machine (HP 9000 unix), using a long forgotten database 
> > engine, the first attempt using SQL took 1 week to migrate, the 
> > second using low level C calls took months to develop and migrated 
> > in the required
> > 3.5 hours, but the important thing to note is that It never crossed 
> > my mind to question the reliability of the machine, the database or 
> > the C calls I was making to the DB, it just worked, the Server could 
> > be locked for minutes swapping to disk because of lack of memory or 
> > overload, but it never failed once and repeated the exercise time 
> > and time again, reliably and in a predictable timeframe.
>
> > All this said there are advantages to GAE that are worth fighting 
> > with it's limitations, I have not yet found anything else that is so 
> > immediately and massively scalable and at the same time does not 
> > require me to manage the software and hardware, this is invaluable, 
> > and although I know that I could have a easier job moving to MySQL, 
> > I just don't want to manage an OS and a DB engine, I don't have the 
> > time, I have done it and don't think that's where I am going to earn 
> > my
> bacon.
>
> > I will always envy some of the people answering your message for the 
> > depth of knowledge they have of this platform and the fact that they 
> > always have the right solution and right answer to everything, it 
> > must be great to never make mistakes.
>
> > -R
>
> > On Dec 29, 6:25 am, jon <[email protected]> wrote:
> > > Yohan I agree that there should be an easy and cheap way to get 
> > > your data out. I think it's a little unfair that leaving GAE is 
> > > made that hard.
>
> > > How much did you spend on your custom data download tool? Would 
> > > you consider open sourcing it for other developers who are caught 
> > > in the same position? I'd hate spending weeks building a custom 
> > > tool just to get my data out.
>
> > > Thanks for sharing your experience.
>
> > > On Dec 29, 12:26 am, Yohan <[email protected]> wrote:
>
> > > > Hi Brandon,
>
> > > > Although i agree with you that the original dataset wasnt fully 
> > > > optimized (that was over 2 years ago), i believe that i have a 
> > > > good understanding of datatore vs SQL, caching etc. Im not 
> > > > building public facing website im dealing with private apis and 
> > > > I am already stretching memcache and custom built java cache to 
> > > > the
> limits.
>
> > > > I am also not talking about the reasons why im migrating out of GAE.
> > > > The points i highlighted were:
>
> > > > - no easy way to get your data out
> > > > - no cheap way to get your big data out
> > > > - bulk export in python doesn't handle binary/blob data
> > > > - remote api is unstable
> > > > - running database queries using cursors for long period of time 
> > > > is unreliable (many times the cursor got reset for some reason 
> > > > or the query would return a 0000000 cursor thus screwing 1 week 
> > > > of data
> > > > processing)
> > > > - it cost me an arm to delete my data
>
> > > > To answer other questions :
> > > > - of course i thought about migrating the remaining data to a 
> > > > new app then alias from the old app to the new one. But it means 
> > > > interrupting the service (disable datastore writes) and i cant 
> > > > afford that. Plus the remaining data is still quite big.
> > > > - the multi indexes: everytime i changed the data structure i 
> > > > would reprocess everything to conform it to the new schema. Im 
> > > > not using any framework like objectify or jdo, im working with 
> > > > the raw api directly (which is way more elegant)
> > > > - im not criticizing the platform i am criticizing the lack of 
> > > > tools to export and the prohibitive cost of manipulating large 
> > > > data
> sets.
> > > > I actually love GAE, it is just not for this kind of dataset 
> > > > thats
> all.
>
> > > > @Brandon : If
>
> ...
>
> read more »

--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to