Hi Brandon, Interesting story but you rarely design facebook for 500 millions people right from the start and alone...
Anyway i would love to know how much it would cost you and how long you would need to get your data out of your super/big apps. Please share. Cheers On Dec 29, 6:58 pm, "Brandon Wirtz" <[email protected]> wrote: > If you check the archives I have shared times when my requests were well > over 5000/s. > > I would say GAE handles big data really well. But you have to do testing to > make sure your structure is correct, and that your indexes are well thought > out. > > Planning is always possible. Testing is always possible. But like driving > my Mini Cooper around LeGuna Seca, vs. driving a Ferrari around it. The > Ferrari is only faster if you can handle it. My mom can run laps in the > mini cooper, but would end up in the wall in a Ferrari. > > Or like the discussion about executing code from students. > > GAE is cycles on demand, so if you can build your app to be efficient it is > cheap. If you build it with errors it is expensive. > > I recently found I could knock 3% off of my bill by disabling logging. > That's the level of testing we do. People say "but how can you afford to > pay devs to write code if you worry that much" well we are betting on the > long haul. We only need to learn the lesson once to capitalize on it for > years. > > You say you can't predict growth. Sure I can. I either engineer something to > work for me and 3 of my friends, or I engineer it to be the next facebook. > There is room for some differences along the way, but I could build facebook > on GAE. No worry about big data, or scaling. (I think the GAE team would > deploy servers for me as fast as I could fill them) > > Things that are designed for you and your friends you don't market, you > don't tell people about, so they don't grow. When we went from CDNinabox > going from something brandon uses for his sites to being a product, the > product got lots of complete re-writes. Testing in Java and Python, the > caching mechanism we use ended up using 4 different models based on the type > of site traffic the site we are accelerating gets. 1 hack for me became a > software with 40+ optimizations that can be turned on and off to make things > run up to 80% cheaper than the defaults. And to pick those settings we test. > We even schedule changes to test real traffic for periods of time. > > I think the real lesson I'm trying to convey is one I learned at MSFT. For > every dev there is 1/40th of a CTO, 1/10 of a product manager 2 test > engineers 1/5 of a release manager, and 1/5 of a performance engineer. That > is 2.5 support staff for every programmer. If you are just writing code you > are working in a vacuum that makes it hard to plan, test, debug, and run > scalability metrics. > > > > > > > > -----Original Message----- > From: [email protected] > > [mailto:[email protected]] On Behalf Of Yohan > Sent: Thursday, December 29, 2011 2:00 AM > To: Google App Engine > Subject: [google-appengine] Re: Cautionary Tale: Abusive price for data > migration and deletion > > Hi Brandon, > > Well I started using GAE simply because 2 years ago i was a tech team of 1 > and I couldn't afford to hire full time sysadmins. I'm migrating some of my > stuff out now that i have more guys to help me. And GAE is a great platform > that runs on its own and doesn't require much administration (i launched > games and apps on it that just run for months with no major issues). So > great for starting up. But as soon as you enter the big data domain, you > need more control about the way you can process and move your data around > (the big companies all have their own datacenters because they need full > control about the > infrastructure) and thus a PAAS may not be suited anymore. > > It's hard to plan that your business will grow 10x within a few months and > the tech infrastructure must suddenly grow from 50 req/s to 5,000 req/s. BTW > GAE can't handle such load well (latency of min 500ms on java seriously > suck, not talking about write contention on the datastore). It is easy to > plan when everything can be defined in advance (with budgets and stuff) but > you don't always have the option. > > But thanks for sharing your inputs anyway, always appreciated ;) > > On Dec 29, 4:31 pm, "Brandon Wirtz" <[email protected]> wrote: > > Development is not about not making mistakes, it is about doing > > structured performance testing and cost analysis. > > > My team writes 500 lines of code for every 50 that make it in to the > > final product. > > > We know things about the efficiencies of Do While vs. ForEach that > > quite possibly Google doesn't even know. We are that anal about > > testing. We test query speed done different way's and compare cost > > and performance based on the anticipated ratios of use. > > > We just never let "mistakes" grow to the point we can't control them. > > > -----Original Message----- > > From: [email protected] > > > [mailto:[email protected]] On Behalf Of Raymond > > Sent: Thursday, December 29, 2011 12:13 AM > > To: Google App Engine > > Subject: [google-appengine] Re: Cautionary Tale: Abusive price for > > data migration and deletion > > > Dear Yohan, > > > On my side I thank you for sharing your experience, I am beginning > > with GAE and know that whatever the time I will put on this project I > > will be making beginner mistakes and this kind of info is precious. > > I have now a limited experience with GAE and have to compare it with > > what I know and in some sectors GAE look very bad, for example I can't > > imagine Oracle, DB2, Informix, etc, ...MsSQL, etc having any > > commercial success if they would not have implemented rock solid > > solutions to import and export data, backup, build and drop tables and > > databases and of course calculate precisely the data space required to > > build a data structure, in some cases down to the byte. > > Although I understand the very different nature of GAE compared to > > this traditional DB engines, I think that any professional developer, > > IT manager, project manager, or person responsible for budget would > > feel very uncomfortable building a system without a firm grip on it's > > costs or a reasonable solution to modify an initial implementation or > > migrate away from it. Also the fact that part of the GAE tools are > > simply not reliable enough to be able to plan effort and time required > > to do something is an other big minus for this solution. > > > Although DB's are not my main competence, my very first paid job > > 20+years ago was to migrate a critical database to a new structure on > > a new machine (HP 9000 unix), using a long forgotten database engine, > > the first attempt using SQL took 1 week to migrate, the second using > > low level C calls took months to develop and migrated in the required > > 3.5 hours, but the important thing to note is that It never crossed my > > mind to question the reliability of the machine, the database or the C > > calls I was making to the DB, it just worked, the Server could be > > locked for minutes swapping to disk because of lack of memory or > > overload, but it never failed once and repeated the exercise time and > > time again, reliably and in a predictable timeframe. > > > All this said there are advantages to GAE that are worth fighting with > > it's limitations, I have not yet found anything else that is so > > immediately and massively scalable and at the same time does not > > require me to manage the software and hardware, this is invaluable, > > and although I know that I could have a easier job moving to MySQL, I > > just don't want to manage an OS and a DB engine, I don't have the > > time, I have done it and don't think that's where I am going to earn my > bacon. > > > I will always envy some of the people answering your message for the > > depth of knowledge they have of this platform and the fact that they > > always have the right solution and right answer to everything, it must > > be great to never make mistakes. > > > -R > > > On Dec 29, 6:25 am, jon <[email protected]> wrote: > > > Yohan I agree that there should be an easy and cheap way to get your > > > data out. I think it's a little unfair that leaving GAE is made that > > > hard. > > > > How much did you spend on your custom data download tool? Would you > > > consider open sourcing it for other developers who are caught in the > > > same position? I'd hate spending weeks building a custom tool just > > > to get my data out. > > > > Thanks for sharing your experience. > > > > On Dec 29, 12:26 am, Yohan <[email protected]> wrote: > > > > > Hi Brandon, > > > > > Although i agree with you that the original dataset wasnt fully > > > > optimized (that was over 2 years ago), i believe that i have a > > > > good understanding of datatore vs SQL, caching etc. Im not > > > > building public facing website im dealing with private apis and I > > > > am already stretching memcache and custom built java cache to the > limits. > > > > > I am also not talking about the reasons why im migrating out of GAE. > > > > The points i highlighted were: > > > > > - no easy way to get your data out > > > > - no cheap way to get your big data out > > > > - bulk export in python doesn't handle binary/blob data > > > > - remote api is unstable > > > > - running database queries using cursors for long period of time > > > > is unreliable (many times the cursor got reset for some reason or > > > > the query would return a 0000000 cursor thus screwing 1 week of > > > > data > > > > processing) > > > > - it cost me an arm to delete my data > > > > > To answer other questions : > > > > - of course i thought about migrating the remaining data to a new > > > > app then alias from the old app to the new one. But it means > > > > interrupting the service (disable datastore writes) and i cant > > > > afford that. Plus the remaining data is still quite big. > > > > - the multi indexes: everytime i changed the data structure i > > > > would reprocess everything to conform it to the new schema. Im not > > > > using any framework like objectify or jdo, im working with the raw > > > > api directly (which is way more elegant) > > > > - im not criticizing the platform i am criticizing the lack of > > > > tools to export and the prohibitive cost of manipulating large data > sets. > > > > I actually love GAE, it is just not for this kind of dataset thats > all. > > > > > @Brandon : If > > ... > > read more » -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
