Lots. Did you see the thread about the push the button check back in 48 hours?
Though to be fair on RDS we just did a data dump to move to a new system which we won't mention here, and our SQL export to 288 hours 17 minutes. Data migration over the internet is tough when you get above 1 TB. And making sure you don't have corruption during the move is rough. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Yohan Sent: Thursday, December 29, 2011 3:41 AM To: Google App Engine Subject: [google-appengine] Re: Cautionary Tale: Abusive price for data migration and deletion Hi Brandon, Interesting story but you rarely design facebook for 500 millions people right from the start and alone... Anyway i would love to know how much it would cost you and how long you would need to get your data out of your super/big apps. Please share. Cheers On Dec 29, 6:58 pm, "Brandon Wirtz" <[email protected]> wrote: > If you check the archives I have shared times when my requests were > well over 5000/s. > > I would say GAE handles big data really well. But you have to do > testing to make sure your structure is correct, and that your indexes > are well thought out. > > Planning is always possible. Testing is always possible. But like > driving my Mini Cooper around LeGuna Seca, vs. driving a Ferrari > around it. The Ferrari is only faster if you can handle it. My mom > can run laps in the mini cooper, but would end up in the wall in a Ferrari. > > Or like the discussion about executing code from students. > > GAE is cycles on demand, so if you can build your app to be efficient > it is cheap. If you build it with errors it is expensive. > > I recently found I could knock 3% off of my bill by disabling logging. > That's the level of testing we do. People say "but how can you > afford to pay devs to write code if you worry that much" well we are > betting on the long haul. We only need to learn the lesson once to > capitalize on it for years. > > You say you can't predict growth. Sure I can. I either engineer > something to work for me and 3 of my friends, or I engineer it to be the next facebook. > There is room for some differences along the way, but I could build > facebook on GAE. No worry about big data, or scaling. (I think the > GAE team would deploy servers for me as fast as I could fill them) > > Things that are designed for you and your friends you don't market, > you don't tell people about, so they don't grow. When we went from > CDNinabox going from something brandon uses for his sites to being a > product, the product got lots of complete re-writes. Testing in Java > and Python, the caching mechanism we use ended up using 4 different > models based on the type of site traffic the site we are accelerating > gets. 1 hack for me became a software with 40+ optimizations that can > be turned on and off to make things run up to 80% cheaper than the defaults. And to pick those settings we test. > We even schedule changes to test real traffic for periods of time. > > I think the real lesson I'm trying to convey is one I learned at MSFT. > For every dev there is 1/40th of a CTO, 1/10 of a product manager 2 > test engineers 1/5 of a release manager, and 1/5 of a performance > engineer. That is 2.5 support staff for every programmer. If you are > just writing code you are working in a vacuum that makes it hard to > plan, test, debug, and run scalability metrics. > > > > > > > > -----Original Message----- > From: [email protected] > > [mailto:[email protected]] On Behalf Of Yohan > Sent: Thursday, December 29, 2011 2:00 AM > To: Google App Engine > Subject: [google-appengine] Re: Cautionary Tale: Abusive price for > data migration and deletion > > Hi Brandon, > > Well I started using GAE simply because 2 years ago i was a tech team > of 1 and I couldn't afford to hire full time sysadmins. I'm migrating > some of my stuff out now that i have more guys to help me. And GAE is > a great platform that runs on its own and doesn't require much > administration (i launched games and apps on it that just run for > months with no major issues). So great for starting up. But as soon as > you enter the big data domain, you need more control about the way you > can process and move your data around (the big companies all have > their own datacenters because they need full control about the > infrastructure) and thus a PAAS may not be suited anymore. > > It's hard to plan that your business will grow 10x within a few months > and the tech infrastructure must suddenly grow from 50 req/s to 5,000 > req/s. BTW GAE can't handle such load well (latency of min 500ms on > java seriously suck, not talking about write contention on the > datastore). It is easy to plan when everything can be defined in > advance (with budgets and stuff) but you don't always have the option. > > But thanks for sharing your inputs anyway, always appreciated ;) > > On Dec 29, 4:31 pm, "Brandon Wirtz" <[email protected]> wrote: > > Development is not about not making mistakes, it is about doing > > structured performance testing and cost analysis. > > > My team writes 500 lines of code for every 50 that make it in to the > > final product. > > > We know things about the efficiencies of Do While vs. ForEach that > > quite possibly Google doesn't even know. We are that anal about > > testing. We test query speed done different way's and compare cost > > and performance based on the anticipated ratios of use. > > > We just never let "mistakes" grow to the point we can't control them. > > > -----Original Message----- > > From: [email protected] > > > [mailto:[email protected]] On Behalf Of Raymond > > Sent: Thursday, December 29, 2011 12:13 AM > > To: Google App Engine > > Subject: [google-appengine] Re: Cautionary Tale: Abusive price for > > data migration and deletion > > > Dear Yohan, > > > On my side I thank you for sharing your experience, I am beginning > > with GAE and know that whatever the time I will put on this project > > I will be making beginner mistakes and this kind of info is precious. > > I have now a limited experience with GAE and have to compare it with > > what I know and in some sectors GAE look very bad, for example I > > can't imagine Oracle, DB2, Informix, etc, ...MsSQL, etc having any > > commercial success if they would not have implemented rock solid > > solutions to import and export data, backup, build and drop tables > > and databases and of course calculate precisely the data space > > required to build a data structure, in some cases down to the byte. > > Although I understand the very different nature of GAE compared to > > this traditional DB engines, I think that any professional > > developer, IT manager, project manager, or person responsible for > > budget would feel very uncomfortable building a system without a > > firm grip on it's costs or a reasonable solution to modify an > > initial implementation or migrate away from it. Also the fact that > > part of the GAE tools are simply not reliable enough to be able to > > plan effort and time required to do something is an other big minus for this solution. > > > Although DB's are not my main competence, my very first paid job > > 20+years ago was to migrate a critical database to a new structure > > 20+on > > a new machine (HP 9000 unix), using a long forgotten database > > engine, the first attempt using SQL took 1 week to migrate, the > > second using low level C calls took months to develop and migrated > > in the required > > 3.5 hours, but the important thing to note is that It never crossed > > my mind to question the reliability of the machine, the database or > > the C calls I was making to the DB, it just worked, the Server could > > be locked for minutes swapping to disk because of lack of memory or > > overload, but it never failed once and repeated the exercise time > > and time again, reliably and in a predictable timeframe. > > > All this said there are advantages to GAE that are worth fighting > > with it's limitations, I have not yet found anything else that is so > > immediately and massively scalable and at the same time does not > > require me to manage the software and hardware, this is invaluable, > > and although I know that I could have a easier job moving to MySQL, > > I just don't want to manage an OS and a DB engine, I don't have the > > time, I have done it and don't think that's where I am going to earn > > my > bacon. > > > I will always envy some of the people answering your message for the > > depth of knowledge they have of this platform and the fact that they > > always have the right solution and right answer to everything, it > > must be great to never make mistakes. > > > -R > > > On Dec 29, 6:25 am, jon <[email protected]> wrote: > > > Yohan I agree that there should be an easy and cheap way to get > > > your data out. I think it's a little unfair that leaving GAE is > > > made that hard. > > > > How much did you spend on your custom data download tool? Would > > > you consider open sourcing it for other developers who are caught > > > in the same position? I'd hate spending weeks building a custom > > > tool just to get my data out. > > > > Thanks for sharing your experience. > > > > On Dec 29, 12:26 am, Yohan <[email protected]> wrote: > > > > > Hi Brandon, > > > > > Although i agree with you that the original dataset wasnt fully > > > > optimized (that was over 2 years ago), i believe that i have a > > > > good understanding of datatore vs SQL, caching etc. Im not > > > > building public facing website im dealing with private apis and > > > > I am already stretching memcache and custom built java cache to > > > > the > limits. > > > > > I am also not talking about the reasons why im migrating out of GAE. > > > > The points i highlighted were: > > > > > - no easy way to get your data out > > > > - no cheap way to get your big data out > > > > - bulk export in python doesn't handle binary/blob data > > > > - remote api is unstable > > > > - running database queries using cursors for long period of time > > > > is unreliable (many times the cursor got reset for some reason > > > > or the query would return a 0000000 cursor thus screwing 1 week > > > > of data > > > > processing) > > > > - it cost me an arm to delete my data > > > > > To answer other questions : > > > > - of course i thought about migrating the remaining data to a > > > > new app then alias from the old app to the new one. But it means > > > > interrupting the service (disable datastore writes) and i cant > > > > afford that. Plus the remaining data is still quite big. > > > > - the multi indexes: everytime i changed the data structure i > > > > would reprocess everything to conform it to the new schema. Im > > > > not using any framework like objectify or jdo, im working with > > > > the raw api directly (which is way more elegant) > > > > - im not criticizing the platform i am criticizing the lack of > > > > tools to export and the prohibitive cost of manipulating large > > > > data > sets. > > > > I actually love GAE, it is just not for this kind of dataset > > > > thats > all. > > > > > @Brandon : If > > ... > > read more » -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
