If you check the archives I have shared times when my requests were well over 5000/s.
I would say GAE handles big data really well. But you have to do testing to make sure your structure is correct, and that your indexes are well thought out. Planning is always possible. Testing is always possible. But like driving my Mini Cooper around LeGuna Seca, vs. driving a Ferrari around it. The Ferrari is only faster if you can handle it. My mom can run laps in the mini cooper, but would end up in the wall in a Ferrari. Or like the discussion about executing code from students. GAE is cycles on demand, so if you can build your app to be efficient it is cheap. If you build it with errors it is expensive. I recently found I could knock 3% off of my bill by disabling logging. That's the level of testing we do. People say "but how can you afford to pay devs to write code if you worry that much" well we are betting on the long haul. We only need to learn the lesson once to capitalize on it for years. You say you can't predict growth. Sure I can. I either engineer something to work for me and 3 of my friends, or I engineer it to be the next facebook. There is room for some differences along the way, but I could build facebook on GAE. No worry about big data, or scaling. (I think the GAE team would deploy servers for me as fast as I could fill them) Things that are designed for you and your friends you don't market, you don't tell people about, so they don't grow. When we went from CDNinabox going from something brandon uses for his sites to being a product, the product got lots of complete re-writes. Testing in Java and Python, the caching mechanism we use ended up using 4 different models based on the type of site traffic the site we are accelerating gets. 1 hack for me became a software with 40+ optimizations that can be turned on and off to make things run up to 80% cheaper than the defaults. And to pick those settings we test. We even schedule changes to test real traffic for periods of time. I think the real lesson I'm trying to convey is one I learned at MSFT. For every dev there is 1/40th of a CTO, 1/10 of a product manager 2 test engineers 1/5 of a release manager, and 1/5 of a performance engineer. That is 2.5 support staff for every programmer. If you are just writing code you are working in a vacuum that makes it hard to plan, test, debug, and run scalability metrics. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Yohan Sent: Thursday, December 29, 2011 2:00 AM To: Google App Engine Subject: [google-appengine] Re: Cautionary Tale: Abusive price for data migration and deletion Hi Brandon, Well I started using GAE simply because 2 years ago i was a tech team of 1 and I couldn't afford to hire full time sysadmins. I'm migrating some of my stuff out now that i have more guys to help me. And GAE is a great platform that runs on its own and doesn't require much administration (i launched games and apps on it that just run for months with no major issues). So great for starting up. But as soon as you enter the big data domain, you need more control about the way you can process and move your data around (the big companies all have their own datacenters because they need full control about the infrastructure) and thus a PAAS may not be suited anymore. It's hard to plan that your business will grow 10x within a few months and the tech infrastructure must suddenly grow from 50 req/s to 5,000 req/s. BTW GAE can't handle such load well (latency of min 500ms on java seriously suck, not talking about write contention on the datastore). It is easy to plan when everything can be defined in advance (with budgets and stuff) but you don't always have the option. But thanks for sharing your inputs anyway, always appreciated ;) On Dec 29, 4:31 pm, "Brandon Wirtz" <[email protected]> wrote: > Development is not about not making mistakes, it is about doing > structured performance testing and cost analysis. > > My team writes 500 lines of code for every 50 that make it in to the > final product. > > We know things about the efficiencies of Do While vs. ForEach that > quite possibly Google doesn't even know. We are that anal about > testing. We test query speed done different way's and compare cost > and performance based on the anticipated ratios of use. > > We just never let "mistakes" grow to the point we can't control them. > > > > > > > > -----Original Message----- > From: [email protected] > > [mailto:[email protected]] On Behalf Of Raymond > Sent: Thursday, December 29, 2011 12:13 AM > To: Google App Engine > Subject: [google-appengine] Re: Cautionary Tale: Abusive price for > data migration and deletion > > Dear Yohan, > > On my side I thank you for sharing your experience, I am beginning > with GAE and know that whatever the time I will put on this project I > will be making beginner mistakes and this kind of info is precious. > I have now a limited experience with GAE and have to compare it with > what I know and in some sectors GAE look very bad, for example I can't > imagine Oracle, DB2, Informix, etc, ...MsSQL, etc having any > commercial success if they would not have implemented rock solid > solutions to import and export data, backup, build and drop tables and > databases and of course calculate precisely the data space required to > build a data structure, in some cases down to the byte. > Although I understand the very different nature of GAE compared to > this traditional DB engines, I think that any professional developer, > IT manager, project manager, or person responsible for budget would > feel very uncomfortable building a system without a firm grip on it's > costs or a reasonable solution to modify an initial implementation or > migrate away from it. Also the fact that part of the GAE tools are > simply not reliable enough to be able to plan effort and time required > to do something is an other big minus for this solution. > > Although DB's are not my main competence, my very first paid job > 20+years ago was to migrate a critical database to a new structure on > a new machine (HP 9000 unix), using a long forgotten database engine, > the first attempt using SQL took 1 week to migrate, the second using > low level C calls took months to develop and migrated in the required > 3.5 hours, but the important thing to note is that It never crossed my > mind to question the reliability of the machine, the database or the C > calls I was making to the DB, it just worked, the Server could be > locked for minutes swapping to disk because of lack of memory or > overload, but it never failed once and repeated the exercise time and > time again, reliably and in a predictable timeframe. > > All this said there are advantages to GAE that are worth fighting with > it's limitations, I have not yet found anything else that is so > immediately and massively scalable and at the same time does not > require me to manage the software and hardware, this is invaluable, > and although I know that I could have a easier job moving to MySQL, I > just don't want to manage an OS and a DB engine, I don't have the > time, I have done it and don't think that's where I am going to earn my bacon. > > I will always envy some of the people answering your message for the > depth of knowledge they have of this platform and the fact that they > always have the right solution and right answer to everything, it must > be great to never make mistakes. > > -R > > On Dec 29, 6:25 am, jon <[email protected]> wrote: > > Yohan I agree that there should be an easy and cheap way to get your > > data out. I think it's a little unfair that leaving GAE is made that > > hard. > > > How much did you spend on your custom data download tool? Would you > > consider open sourcing it for other developers who are caught in the > > same position? I'd hate spending weeks building a custom tool just > > to get my data out. > > > Thanks for sharing your experience. > > > On Dec 29, 12:26 am, Yohan <[email protected]> wrote: > > > > Hi Brandon, > > > > Although i agree with you that the original dataset wasnt fully > > > optimized (that was over 2 years ago), i believe that i have a > > > good understanding of datatore vs SQL, caching etc. Im not > > > building public facing website im dealing with private apis and I > > > am already stretching memcache and custom built java cache to the limits. > > > > I am also not talking about the reasons why im migrating out of GAE. > > > The points i highlighted were: > > > > - no easy way to get your data out > > > - no cheap way to get your big data out > > > - bulk export in python doesn't handle binary/blob data > > > - remote api is unstable > > > - running database queries using cursors for long period of time > > > is unreliable (many times the cursor got reset for some reason or > > > the query would return a 0000000 cursor thus screwing 1 week of > > > data > > > processing) > > > - it cost me an arm to delete my data > > > > To answer other questions : > > > - of course i thought about migrating the remaining data to a new > > > app then alias from the old app to the new one. But it means > > > interrupting the service (disable datastore writes) and i cant > > > afford that. Plus the remaining data is still quite big. > > > - the multi indexes: everytime i changed the data structure i > > > would reprocess everything to conform it to the new schema. Im not > > > using any framework like objectify or jdo, im working with the raw > > > api directly (which is way more elegant) > > > - im not criticizing the platform i am criticizing the lack of > > > tools to export and the prohibitive cost of manipulating large data sets. > > > I actually love GAE, it is just not for this kind of dataset thats all. > > > > @Brandon : If you have a way to delete 2 billions entities > > > (whatever their size) on the cheap please let me know. > > > > On Dec 28, 8:48 pm, Leandro Rezende <[email protected]> wrote: > > > > > u pay to write, pay to keep it stored... delete should be free. > > > > > 2011/12/28 Brandon Wirtz <[email protected]> > > > > > > Yes, **** > > > > > > While the primary app I talk about is edge Cache, that’s > > > > > because that’s the thing that people can most benefit from > > > > > that people don’t seem to be > > > > > using.**** > > > > > > ** ** > > > > > > As part of my SEO tools we have what is now a 60 TB database > > > > > of Backlinks and Crawler data about websites in the top 200k > > > > > Alexa Sites. **** > > > > > > ** ** > > > > > > Why should Deleting be Cheaper? The Operation takes the same > > > > > amount of CPU, and after you do the delete you don’t have to > > > > > pay for storage.**** > > > > > > ** ** > > > > > > I don’t do near as much in the Java Space but it doesn’t seem > > > > > there should be much difference between Python and Java. I > > > > > ported both the primary apps to both languages to do > > > > > comparative cost analysis, and there have been a few things > > > > > that we found were faster or cheaper with one or the other, as > > > > > a result in some case we deploy both and use different > > > > > versioning so they can both be live and attached to the same > > > > > data.**** > > > > > > ** ** > > > > > > ** ** > > > > > > *From:* [email protected] [mailto: > > > > > [email protected]] *On Behalf Of *André > > > > > Pankraz > > > > > *Sent:* Wednesday, December 28, 2011 12:06 AM > > > > > *To:* [email protected] > > > > > *Cc:* [email protected] > > > > > *Subject:* [google-appengine] Re: Cautionary Tale: Abusive > > > > > price for data migration and deletion**** > > > > > > ** ** > > > > > > Sry Brandon...he has a point - deleting data should be > > > > > cheaper, even if it's technically the same like writing. > > > > > Maybe he made some mistakes but you sometimes sound like a > > > > > fanboy with GAE stockholm syndrome. ;) See what I did > here...annoying accusations. > > > > > You have very good experience with Python, Cache stuff, Edge > > > > > cache etc., but do you really have experience with multiple > > > > > 100 GB datastore to talk like this? > > > > > E.g.: I have also seen some answers from you (often very > > > > > helpful) that are just plain wrong in the Java > > > > > environment.**** > > > > > > -- > > > > > You received this message because you are subscribed to the > > > > >Google Groups "Google App Engine" group. > > > > > To view this discussion on the web visit > > > > >https://groups.google.com/d/msg/google-appengine/-/oJRZxuV7yQgJ. > > > > > To post to this group, send email to > [email protected]. > > > > > To unsubscribe from this group, send email to > > > > > [email protected]. > > > > > For more options, visit this group at > > > > >http://groups.google.com/group/google-appengine?hl=en.**** > > > > > > -- > > > > > You received this message because you are subscribed to the > > > > > Google Groups "Google App Engine" group. > > > > > To post to this group, send email to > [email protected]. > > > > > To unsubscribe from this group, send email to > > > > > [email protected]. > > > > > For more options, visit this group at > > > > >http://groups.google.com/group/google-appengine?hl=en. > > -- > You received this message because you are subscribed to the Google > Groups "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group athttp://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
